Arrays for detecting nucleic acids

ABSTRACT

The present invention provides methods and apparatus for sequencing, fingerprinting and mapping biological macromolecules, typically biological polymers. The methods make use of a plurality of sequence specific recognition reagents which can also be used for classification of biological samples, and to characterize their sources.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This is a continuation of application Ser. No. 09/670,563, filedSep. 27, 2000; which is a continuation of application Ser. No.09/362,089, filed Jul. 28, 1999; which is a divisional of applicationSer. No. 09/056,927, filed Apr. 8, 1998, pending; which is acontinuation of application Ser. No. 08/670,118, filed Jun. 25, 1996,now U.S. Pat. No. 5,800,992; which is a divisional of application Ser.No. 08/168,904, filed Dec. 15, 1993; which is a continuation ofapplication Ser. No. 07/624,114, filed Dec. 6, 1990, now abandoned;which is a continuation in-part of commonly assigned application Ser.No. 07/492,462, filed Mar. 7, 1990, now U.S. Pat. No. 5,143,854; andapplication Ser. No. 07/362,901, filed Jun. 7, 1989, now abandoned whichare hereby incorporated by reference.

[0002] Additional commonly assigned application Ser. Nos. 07/624,120 and07/626,730, both of which were filed on Dec. 6, 1990; application Ser.No. 07/435,316, filed Nov. 13, 1989, now abandoned; and U.S. Pat. No.5,252,743 are also hereby incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0003] The present invention relates to the sequencing, fingerprinting,and mapping of polymers, particularly biological polymers. Theinventions may be applied, for example, in the sequencing,fingerprinting, or mapping of nucleic acids, polypeptides,oligosaccharides, and synthetic polymers.

[0004] The relationship between structure and function of macromoleculesis of fundamental importance in the understanding of biological systems.These relationships are important to understanding, for example, thefunctions of enzymes, structural proteins, and signalling proteins, waysin which cells communicate with each other, as well as mechanisms ofcellular control and metabolic feedback.

[0005] Genetic information is critical in continuation of lifeprocesses. Life is substantially informationally based and its geneticcontent controls the growth and reproduction of the organism and itscomplements. Polypeptides, which are critical features of all livingsystems, are encoded by the genetic material of the cell. In particular,the properties of enzymes, functional proteins, and structural proteinsare determined by the sequence of amino acids which make them up. Asstructure and function are integrally related, many biological functionsmay be explained by elucidating the underlying structural features whichprovide those functions. For this reason, it has become very importantto determine the genetic sequences of nucleotides which encode theenzymes, structural proteins, and other effectors of biologicalfunctions. In addition to segments of nucleotides which encodepolypeptides, there are many nucleotide sequences which are involved incontrol and regulation of gene expression.

[0006] The human genome project is directed toward determining thecomplete sequence of the genome of the human organism. Although such asequence would not correspond to the sequence of any specificindividual, it would provide significant information as to the generalorganization and specific sequences contained within segments fromparticular individuals. It would also provide mapping information whichis very useful for further detailed studies. However, the need forhighly rapid, accurate, and inexpensive sequencing technology is nowheremore apparent than in a demanding sequencing project such as this. Tocomplete the sequencing of a human genome would require thedetermination of approximately 3×10⁹, or 3 billion base pairs.

[0007] The procedures typically used today for sequencing include theSanger dideoxy method, see, e.g., Sanger et al. (1977) Proc. Natl. Acad.Sci. USA, 74:5463-5467, or the Maxam and Gilbert method, see, e.g.,Maxam et al., (1980) Methods in Enzymology, 65:499-559. The Sangermethod utilizes enzymatic elongation procedures with chain terminatingnucleotides. The Maxam and Gilbert method uses chemical reactionsexhibiting specificity of reaction to generate nucleotide specificcleavages. Both methods require a practitioner to perform a large numberof complex manual manipulations. These manipulations usually requireisolating homogeneous DNA fragments, elaborate and tedious preparing ofsamples, preparing a separating gel, applying samples to the gel,electrophoresing the samples into this gel, working up the finished gel,and analyzing the results of the procedure.

[0008] Thus, a less expensive, highly reliable, and labor efficientmeans for sequencing biological macromolecules is needed. A substantialreduction in cost and increase in speed of nucleotide sequencing wouldbe very much welcomed. In particular, an automated system would improvethe reproducibility and accuracy of procedures. The present inventionsatisfies these and other needs.

SUMMARY OF THE INVENTION

[0009] The present invention provides improved methods useful for denovo sequencing of an unknown polymer sequence, for verification ofknown sequences, for fingerprinting polymers, and for mapping homologoussegments within a sequence. By reducing the number of manualmanipulations required and automating most of the steps, the speed,accuracy, and reliability of these procedures are greatly enhanced.

[0010] The production of a substrate having a matrix of positionallydefined regions with attached reagents exhibiting known recognitionspecificity can be used for the sequence analysis of a polymer. Althoughmost directly applicable to sequencing, the present invention is alsoapplicable to fingerprinting, mapping, and general screening of specificinteractions. The VLSIPS™ Technology (Very Large Scale ImmobilizedPolymer Synthesis) substrates will be applied to evaluating otherpolymers, e.g., carbohydrates, polypeptides, hydrocarbon syntheticpolymers, and the like. For these non-polynucleotides, the sequencespecific reagents will usually be antibodies specific for a particularsubunit sequence.

[0011] According to one aspect of the masking technique, the inventionprovides an ordered method for forming a plurality of polymer sequencesby sequential addition of reagents comprising the step of seriallyprotecting and deprotecting portions of the plurality of polymersequences for addition of other portions of the polymer sequences usinga binary synthesis strategy.

[0012] The present invention also provides a means to automatesequencing manipulations. The automation of the substrate productionmethod and of the scan and analysis steps minimizes the need for humanintervention. This simplifies the tasks and promotes reproducibility.

[0013] The present invention provides a composition comprising aplurality of positionally distinguishable sequence specific reagentsattached to a solid substrate, which reagents are capable ofspecifically binding to a predetermined subunit sequence of apreselected multi-subunit length having at least three subunits, saidreagents representing substantially all possible sequences of saidpreselected length. In some embodiments, the subunit sequence is apolynucleotide or a polypeptide, in others the preselected multi-subunitlength is five subunits and the subunit sequence is a polynucleotidesequence. In other embodiments, the specific reagent is anoligonucleotide of at least about five nucleotides. Alternatively, thespecific reagent is a monoclonal antibody. Usually the specific reagentsare all attached to a single solid substrate, and the reagents compriseabout 3000 different sequences. In other embodiments, the reagentsrepresents at least about 25% of the possible subsequences of saidpreselected length. Usually, the reagents are localized in regions ofthe substrate having a density of at least 25 regions per squarecentimeter, and often the substrate has a surface area of less thanabout 4 square centimeters.

[0014] The present invention also provides methods for analyzing asequence of a polynucleotide or a polypeptide, said method comprisingthe step of:

[0015] a) exposing said polynucleotide or polypeptide to a compositionas described.

[0016] It also provides useful methods for identifying or comparing atarget sequence with a reference, said method comprising the step of:

[0017] a) exposing said target sequence to a composition as described;

[0018] b) determining the pattern of positions of the reagents whichspecifically interact with the target sequence; and

[0019] c) comparing the pattern with the pattern exhibited by thereference when exposed to the composition.

[0020] The present invention also provides methods for sequencing asegment of a polynucleotide comprising the steps of:

[0021] a) combining:

[0022] i) a substrate comprising a plurality of chemically synthesizedand positionally distinguishable oligonucleotides capable of recognizingdefined oligonucleotide sequences; and

[0023] ii) a target polynucleotide; thereby forming high fidelitymatched duplex structures of complementary subsequences of knownsequence; and

[0024] b) determining which of said reagents have specificallyinteracted with subsequences in said target polynucleotide.

[0025] In one embodiment, the segment is substantially the entire lengthof said polynucleotide.

[0026] The invention also provides methods for sequencing a polymer,said method comprising the steps of:

[0027] a) preparing a plurality of reagents which each specifically bindto a subsequence of preselected length;

[0028] b) positionally attaching each of said reagents to one or moresolid phase substrates, thereby producing substrates of positionallydefinable sequence specific probes;

[0029] c) combining said substrates with a target polymer whose sequenceis to be determined; and

[0030] d) determining which of said reagents have specificallyinteracted with subsequences in said target polymer.

[0031] In one embodiment, the substrates are beads. Preferably, theplurality of reagents comprise substantially all possible subsequencesof said preselected length found in said target. In another embodiment,the solid phase substrate is a single substrate having attached theretoreagents recognizing substantially all possible subsequences ofpreselected length found in said target.

[0032] In another embodiment, the method further comprises the step ofanalyzing a plurality of said recognized subsequences to assemble asequence of said target polymer. In a bead embodiment, at least some ofthe plurality of substrates have one subsequence specific reagentattached thereto, and the substrates are coded to indicate the sequencespecificity of said reagent.

[0033] The present invention also embraces a method of using afluorescent nucleotide to detect interactions with oligonucleotideprobes of known sequence, said method comprising:

[0034] a) attaching said nucleotide to a target unknown polynucleotidesequence, and

[0035] b) exposing said target polynucleotide sequence to a collectionof positionally defined oligonucleotide probes of known sequences todetermine the sequences of said probes which interact with said target.

[0036] In a further refinement, an additional step is included of:

[0037] a) collating said known sequences to determine the overlaps ofsaid known sequences to determine the sequence of said target sequence.

[0038] A method of mapping a plurality of sequences relative to oneanother is also provided, the method comprising:

[0039] a) preparing a substrate having a plurality of positionallyattached sequence specific probes;

[0040] b) exposing each of said sequences to said substrate, therebydetermining the patterns of interaction between said sequence specificprobes and said sequences; and

[0041] c) determining the relative locations of said sequence specificprobe interactions on said sequences to determine the overlaps and orderof said sequences.

[0042] In one refinement, the sequence specific probes areoligonucleotides, applicable to where the target sequences are nucleicacid sequences.

[0043] In the nucleic acid sequencing application, the steps of thesequencing process comprise:

[0044] a) producing a matrix substrate having known positionally definedregions of known sequence specific oligonucleotide probes;

[0045] b) hybridizing a target polynucleotide to the positions on thematrix so that each of the positions which contain oligonucleotideprobes complementary to a sequence on the target hybridize to the targetmolecule;

[0046] c) detecting which positions have bound the target, therebydetermining sequences which are found on the target; and

[0047] d) analyzing the known sequences contained in the target todetermine sequence overlaps and assembling the sequence of the targettherefrom.

[0048] The enablement of the sequencing process by hybridization isbased in large part upon the ability to synthesize a large number (e.g.,to virtually saturate) of the possible overlapping sequence segments anddistinguishing those probes which hybridize with fidelity from thosewhich have mismatched bases, and to analyze a highly complex pattern ofhybridization results to determine the overlap regions.

[0049] The detecting of the positions which bind the target sequencewould typically be through a fluorescent label on the target. Although afluorescent label is probably most convenient, other sorts of labels,e.g., radioactive, enzyme linked, optically detectable, or spectroscopiclabels may be used. Because the oligonucleotide probes are positionallydefined, the location of the hybridized duplex will directly translateto the sequences which hybridize. Thus, analysis of the positionsprovides a collection of subsequences found within the target sequence.These subsequences are matched with respect to their overlaps so as toassemble an intact target sequence.

BRIEF DESCRIPTION OF THE DRAWINGS

[0050]FIG. 1 illustrates a flow chart for sequence, fingerprint, ormapping analysis.

[0051] FIGS. 2A-M illustrates the process of a VLSIPS™ Technologytrinucleotide synthesis.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0052] I. Overall Description

[0053] A. general

[0054] B. VLSIPS substrates

[0055] C. binary masking

[0056] D. applications

[0057] E. detection methods and apparatus

[0058] F. data analysis

[0059] II. Theoretical Analysis

[0060] A. simple n-mer structure; theory

[0061] B. complications

[0062] C. non-polynucleotide embodiments

[0063] III. Polynucleotide Sequencing

[0064] A. preparation of substrate matrix

[0065] B. labeling target polynucleotide

[0066] C. hybridization conditions

[0067] D. detection; VLSIPS scanning

[0068] E. analysis

[0069] F. substrate reuse

[0070] G. non-polynucleotide aspects

[0071] IV. Fingerprinting

[0072] A. general

[0073] B. preparation of substrate matrix

[0074] C. labeling target nucleotides

[0075] D. hybridization conditions

[0076] E. detection; VLSIPS scanning

[0077] F. analysis

[0078] G. substrate reuse

[0079] H. non-polynucleotide aspects

[0080] V. Mapping

[0081] A. general

[0082] B. preparation of substrate matrix

[0083] C. labeling

[0084] D. hybridization/specific interaction

[0085] E. detection

[0086] F. analysis

[0087] G. substrate reuse

[0088] H. non-polynucleotide aspects

[0089] VI. Additional Screening

[0090] A. specific interactions

[0091] B. sequence comparisons

[0092] C. categorizations

[0093] D. statistical correlations

[0094] VII. Formation of Substrate

[0095] A. instrumentation

[0096] B. binary masking

[0097] C. synthetic methods

[0098] D. surface immobilization

[0099] VIII. Hybridization/Specific Interaction

[0100] A. general

[0101] B. important parameters

[0102] IX. Detection Methods

[0103] A. labeling techniques

[0104] B. scanning system

[0105] X. Data Analysis

[0106] A. general

[0107] B. hardware

[0108] C. software

[0109] XI. Substrate Reuse

[0110] A. removal of label

[0111] B. storage and preservation

[0112] C. processes to avoid degradation of oligomers

[0113] XII. Integrated Sequencing Strategy

[0114] A. initial mapping strategy

[0115] B. selection of smaller clones

[0116] C. actual sequencing procedures

[0117] XIII. Commercial Applications

[0118] A. sequencing

[0119] B. fingerprinting

[0120] C. mapping

[0121] I. Overall Description

[0122] A. General

[0123] The present invention relies in part on the ability to synthesizeor attach specific recognition reagents at known locations on asubstrate, typically a single substrate. In particular, the presentinvention provides the ability to prepare a substrate having a very highdensity matrix pattern of positionally defined specific recognitionreagents. The reagents are capable of interacting with their specifictargets while attached to the substrate, e.g., solid phase interactions,and by appropriate labeling of these targets, the sites of theinteractions between the target and the specific reagents may bederived. Because the reagents are positionally defined, the sites of theinteractions will define the specificity of each interaction. As aresult, a map of the patterns of interactions with specific reagents onthe substrate is convertible into information on the specificinteractions taking place, e.g., the recognized features. Where thespecific reagents recognize a large number of possible features, thissystem allows the determination of the combination of specificinteractions which exist on the target molecule. Where the number offeatures is sufficiently large, the identical same combination, orpattern, of features is sufficiently unlikely that a particular targetmolecule may often be uniquely defined by its features. In the extreme,the features may actually be the subunit sequence of the targetmolecule, and a given target sequence may be uniquely defined by itscombination of features.

[0124] In particular, the methodology is applicable to sequencingpolynucleotides. The specific sequence recognition reagents willtypically be oligonucleotide probes which hybridize with specificity tosubsequences found on the target sequence. A sufficiently large numberof those probes allows the fingerprinting of a target polynucleotide orthe relative mapping of a collection of target polynucleotides, asdescribed in greater detail below.

[0125] In the high resolution fingerprinting provided by a saturatingcollection of probes which include all possible subsequences of a givensize, e.g., 10-mers, collating of all the subsequences and determinationof specific overlaps will be derived and the entire sequence can usuallybe reconstructed.

[0126] Although a polynucleotide sequence analysis is a preferredembodiment, for which the specific reagents are most easily accessible,the invention is also applicable to analysis of other polymers,including polypeptides, carbohydrates, and synthetic polymers, includingα-, β-, and ω-amino acids, polyurethanes, polyesters, polycarbonates,polyureas, polyamides, polyethyleneimines, polyarylene sulfides,polysiloxanes, polyimides, polyacetates, and mixed polymers. Variousoptical isomers, e.g., various D- and L-forms of the monomers, may beused.

[0127] Sequence analysis will take the form of complete sequencedetermination, to the level of the sequence of individual subunits alongthe entire length of the target sequence. Sequence analysis also takesthe form of sequence homology, e.g., less than absolute subunitresolution, where “similarity” in the sequence will be detectable, orthe form of selective sequences of homology interspersed at specific orirregular locations.

[0128] In either case, the sequence is determinable at selectiveresolution or at particular locations. Thus, the hybridization methodwill be useful as a means for identification, e.g., a “fingerprint”,much like a Southern hybridization method is used. It is also useful tomap particular target sequences.

[0129] B. VLSIPS™ Technology

[0130] The invention is enabled by the development of technology toprepare substrates on which specific reagents may be either positionallyattached or synthesized. In particular, the very large scale immobilizedpolymer synthesis (VLSIPS™) technology allows for the very high densityproduction of an enormous diversity of reagents mapped out in a knownmatrix pattern on a substrate. These reagents specifically recognizesubsequences in a target polymer and bind thereto, producing a map ofpositionally defined regions of interaction. These map positions areconvertible into actual features recognized, and thus would be presentin the target molecule of interest.

[0131] As indicated, the sequence specific recognition reagents willoften be oligonucleotides which hybridize with fidelity anddiscrimination to the target sequence. For use with other polymers,monoclonal or polyclonal antibodies having high sequence specificitywill often be used.

[0132] In the generic sense, the VLSIPS technology allows the productionof a substrate with a high density matrix of positionally mapped regionswith specific recognition reagents attached at each distinct region. Byuse of protective groups which can be positionally removed, or added,the regions can be activated or deactivated for addition of particularreagents or compounds. Details of the protection are described below andin related Pirrung et al. (1992) U.S. Pat. No. 5,143,854. In a preferredembodiment, photosensitive protecting agents will be used and theregions of activation or deactivation may be controlled byelectro-optical and optical methods, similar to many of the processesused in semiconductor wafer and chip fabrication.

[0133] In the nucleic acid nucleotide sequencing application, a VLSIPSsubstrate is synthesized having positionally defined oligonucleotideprobes. See Pirrung et al. (1992) U.S. Pat. No. 5,143,854; and U.S. Ser.No. 07/624,120, now abandoned. By use of masking technology andphotosensitive synthetic subunits, the VLSIPS apparatus allows for thestepwise synthesis of polymers according to a positionally definedmatrix pattern. Each oligonucleotide probe will be synthesized at knownand defined positional locations on the substrate. This forms a matrixpattern of known relationship between position and specificity ofinteraction. The VLSIPS technology allows the production of a very largenumber of different oligonucleotide probes to be simultaneously andautomatically synthesized including numbers in excess of about 10², 10³,10⁴, 10⁵, 10⁶, or even more, and at densities of at least about 10²,10³/cm², 10⁴/cm², 10⁵/cm² and up to 10⁶/cm² or more. This applicationdiscloses methods for synthesizing polymers on a silicon or othersuitably derivatized substrate, methods and chemistry for synthesizingspecific types of biological polymers on those substrates, apparatus forscanning and detecting whether interaction has occurred at specificlocations on the substrate, and various other technologies related tothe use of a high density very large scale immobilized polymersubstrate. In particular, sequencing, fingerprinting, and mappingapplications are discussed herein in detail, though related technologiesare described in simultaneously filed applications U.S. Ser. No.07/624,120, now abandoned; and U.S. Ser. No. 07/517,659; Dower et al.(1995) U.S. Pat. No. 5,427,908, each of which is hereby incorporatedherein by reference.

[0134] In other embodiments, antibody probes will be generated whichspecifically recognize particular subsequences found on a polymer.Antibodies would be generated which are specific for recognizing a threecontiguous amino acid sequence, and monoclonal antibodies may bepreferred. optimally, these antibodies would not recognize any sequencesother than the specific three amino acid stretch desired and the bindingaffinity should be insensitive to flanking or remote sequences found ona target molecule. Likewise, antibodies specific for particularcarbohydrate linkages or sequences will be generated. A similar approachcould be used for preparing specific reagents which recognize otherpolymer subunit sequences. These reagents would typically be sitespecifically localized to a substrate matrix pattern where the regionsare closely packed.

[0135] These reagents could be individually attached at specific siteson the substrate in a matrix by an automated procedure where the regionsare positionally targeted by some other specific mechanism, e.g., onewhich would allow the entire collection of reagents to be attached tothe substrate in a single reaction. Each reagent could be separatelyattached to a specific oligonucleotide sequence by an automatedprocedure. This would produce a collection of reagents where, e.g., eachmonoclonal antibody would have a unique oligonucleotide sequenceattached to it. By virtue of a VLSIPS substrate which has differentcomplementary oligonucleotides synthesized on it, each monoclonalantibody would specifically be bound only at that site on the substratewhere the complementary oligonucleotide has been synthesized. Acrosslinking step would fix the reagent to the substrate. See, e.g.,Dattagupta et al. (1985) U.S. Pat. No. 4,542,102 and (1987) U.S. Pat.No. 4,713,326; and Chatterjee, M. et al. (1990) J. Am. Chem. Soc.112:6397-6399, which are hereby incorporated herein by reference. Thisallows a high density positionally specific collection of specificrecognition reagents, e.g., monoclonal antibodies, to be immobilized toa solid substrate using an automated system.

[0136] The regions which define particular reagents will usually begenerated by selective protecting groups which may be activated ordeactivated. Typically the protecting group will be bound to a monomersubunit or spatial region, and can be spatially affected by anactivator, such as electromagnetic radiation. Examples of protectivegroups with utility herein include nitroveratryl oxycarbonyl (NVOC),nitrobenzyl oxycarbony (NBOC), dimethyl dimethoxy benzyloxy carbonyl,5-bromo-7-nitroindolinyl, O-hydroxy-α-methyl cinnamoyl, and2-oxymethylene anthraquinone. Examples of activators include ion beams,electric fields, magnetic fields, electron beams, x-ray, and other formsof electromagnetic radiation.

[0137] C. Binary Masking

[0138] In fact, the means for producing a substrate useful for thesetechniques are explained in Pirrung et al. (1992) U.S. Pat. No.5,143,854, which is hereby incorporated herein by reference. However,there are various particular ways to optimize the synthetic processes.Many of these methods are described in Ser. No. 07/624,120, nowabandoned.

[0139] Briefly, the binary synthesis strategy refers to an orderedstrategy for parallel synthesis of diverse polymer sequences bysequential addition of reagents which may be represented by a reactantmatrix, and a switch matrix, the product of which is a product matrix. Areactant matrix is a 1×n matrix of the building blocks to be added. Theswitch matrix is all or a subset of the binary numbers from 1 to narranged in columns. In preferred embodiments, a binary strategy is onein which at least two successive steps illuminate half of a region ofinterest on the substrate. In most preferred embodiments, binarysynthesis refers to a synthesis strategy which also factors a previousaddition step. For example, a strategy in which a switch matrix for amasking strategy halves regions that were previously illuminated,illuminating about half of the previously illuminated region andprotecting the remaining half (while also protecting about half ofpreviously protected regions and illuminating about half of previouslyprotected regions). It will be recognized that binary rounds may beinterspersed with non-binary rounds and that only a portion of asubstrate may be subjected to a binary scheme, but will still beconsidered to be a binary masking scheme within the definition herein. Abinary “masking” strategy is a binary synthesis which uses light toremove protective groups from materials for addition of other materialssuch as nucleotides or amino acids.

[0140] In particular, this procedure provides a simplified and highlyefficient method for saturating all possible sequences of a definedlength polymer. This masking strategy is also particularly useful inproducing all possible oligonucleotide sequence probes of a givenlength.

[0141] D. Applications

[0142] The technology provided by the present invention has very broadapplications. Although described specifically for polynucleotidesequences, similar sequencing, fingerprinting, mapping, and screeningprocedures can be applied to polypeptide, carbohydrate, or otherpolymers. In particular, the present invention may be used to completelysequence a given target sequence to subunit resolution. This may be forde novo sequencing, or may be used in conjunction with a secondsequencing procedure to provide independent verification. See, e.g.,(1988) Science 242:1245. For example, a large polynucleotide sequencedefined by either the Maxam and Gilbert technique or by the Sangertechnique may be verified by using the present invention.

[0143] In addition, by selection of appropriate probes, a polynucleotidesequence can be fingerprinted. Fingerprinting is a less detailedsequence analysis which usually involves the characterization of asequence by a combination of defined features. Sequence fingerprintingis particularly useful because the repertoire of possible features whichcan be tested is virtually infinite. Moreover, the stringency ofmatching is also variable depending upon the application. A SouthernBlot analysis may be characterized as a means of simple fingerprintanalysis.

[0144] Fingerprinting analysis may be performed to the resolution ofspecific nucleotides, or may be used to determine homologies, mostcommonly for large segments. In particular, an array of oligonucleotideprobes of virtually any workable size may be positionally localized on amatrix and used to probe a sequence for either absolute complementarymatching, or homology to the desired level of stringency using selectedhybridization conditions.

[0145] In addition, the present invention provides means for mappinganalysis of a target sequence or sequences. Mapping will usually involvethe sequential ordering of a plurality of various sequences, or mayinvolve the localization of a particular sequence within a plurality ofsequences. This may be achieved by immobilizing particular largesegments onto the matrix and probing with a shorter sequence todetermine which of the large sequences contain that smaller sequence.Alternatively, relatively shorter probes of known or random sequence maybe immobilized to the matrix and a map of various different targetsequences may be determined from overlaps. Principles of such anapproach are described in some detail by Evans et al. (1989) “PhysicalMapping of Complex Genomes by Cosmid Multiplex Analysis,” Proc. Natl.Acad. Sci. USA 86:5030-5034; Michiels et al. (1987) “MolecularApproaches to Genome Analysis: A Strategy for the Construction ofOrdered Overlap Clone Libraries,” CABIOS 3:203-210; Olsen et al. (1986)“Random-Clone Strategy for Genomic Restriction Mapping in Yeast,” Proc.Natl. Acad. Sci. USA 83:7826-7830; Craig, et al. (1990) “Ordering ofCosmid Clones Covering the Herpes Simplex Virus Type I (HSV-I) Genome: ATest Case for Fingerprinting by Hybridization,” Nuc. Acids Res.18:2653-2660; and Coulson, et al. (1986) “Toward a Physical Map of theGenome of the Nematode Caenorhabditis elegans,” Proc. Natl. Acad. Sci.USA 83:7821-7825; each of which is hereby incorporated herein byreference.

[0146] Fingerprinting analysis also provides a means of identification.In addition to its value in apprehension of criminals from whom abiological sample, e.g., blood, has been collected, fingerprinting canensure personal identification for other reasons. For example, it may beuseful for identification of bodies in tragedies such as fire, flood,and vehicle crashes. In other cases the identification may be useful inidentification of persons suffering from amnesia, or of missing persons.Other forensics applications include establishing the identity of aperson, e.g., military identification “dog tags”, or may be used inidentifying the source of particular biological samples. Fingerprintingtechnology is described, e.g., in Carrano, et al. (1989) “AHigh-Resolution, Fluorescence-Based, Semi-automated method for DNAFingerprinting,” Genomics 4: 129-136, which is hereby incorporatedherein by reference. See, e.g., table I, for nucleic acid applications,and corresponding applications may be accomplished using polypeptides.TABLE I VLSIPS ™ TECHNOLOGY IN NUCLEIC ACIDS I. Construction of ChipsII. Applications A. Sequencing 1. Primary sequencing 2. Secondarysequencing (sequence checking) 3. Large scale mapping 4. FingerprintingB. Duplex/Triplex formation 1. Antisense 2. Sequence specific functionmodulation (e.g. promoter inhibition) C. Diagnosis 1. Genetic markers 2.Type markers a. Blood donors b. Tissue transplants D. Microbiology 1.Clinical microbiology 2. Food microbiology III. Instrumentation A. Chipmachines B. Detection IV. Software Development A. Instrumentationsoftware B. Data reduction software C. Sequence analysis software

[0147] The fingerprinting analysis may be used to perform various typesof genetic screening. For example, a single substrate may be generatedwith a plurality of screening probes, allowing for the simultaneousgenetic screening for a large number of genetic markers. Thus, prenatalor diagnostic screening can be simplified, economized, and made moregenerally accessible.

[0148] In addition to the sequencing, fingerprinting, and mappingapplications, the present invention also provides means for determiningspecificity of interaction with particular sequences. Many of theseapplications were described in Ser. No. 07/362,901, now abandoned,Pirrung et al. (1992) U.S. Pat. No. 5,143,854; Ser. No. 07/435,316, andSer. No. 07/612,671.

[0149] E. Detection Methods and Apparatus

[0150] An appropriate detection method applicable to the selectedlabeling method can be selected. Suitable labels includeradionucleotides, enzymes, substrates, cofactors, inhibitors, magneticparticles, heavy metal atoms, and particularly fluorescers,chemiluminescers, and spectroscopic labels. Patents teaching the use ofsuch labels include U.S. Pat. Nos. 3,817,837; 3,850,752; 3,939,350;3,996,345; 4,277,437; 4,275,149; and 4,366,241.

[0151] With an appropriate label selected, the detection system bestadapted for high resolution and high sensitivity detection may beselected. As indicated above, an optically detectable system, e.g.,fluorescence or chemiluminescence would be preferred. Other detectionsystems may be adapted to the purpose, e.g., electron microscopy,scanning electron microscopy (SEM), scanning tunneling electronmicroscopy (STEM), infrared microscopy, atomic force microscopy (AFM),electrical condutance, and image plate transfer.

[0152] With a detection method selected, an apparatus for scanning thesubstrate will be designed. Apparatus, as described in Ser. No.07/362,901, now abandoned; or Pirrung et al. (1992) U.S. Pat. No.5,143,854; or Ser. No. 07/624,120, now abandoned, are particularlyappropriate. Design modifications may also be incorporated therein.

[0153] F. Data Analysis

[0154] Data is analyzed by processes similar to those described below inthe section describing theoretical analysis. More efficient algorithmswill be mathematically devised, and will usually be designed to beperformed on a computer. Various computer programs which may morequickly or efficiently make measurement samples and distinguish signalfrom noise will also be devised. See, particularly, Ser. No. 07/624,120,now abandoned.

[0155] The initial data resulting from the detect-ion system is an arrayof data indicative of fluorescent intensity versus location on thesubstrate. The data are typically taken over regions substantiallysmaller than the area in which synthesis of a given polymer has takenplace. Merely by way of example, if polymers were synthesized in squareson the substrate having dimensions of 500 microns by 500 microns, thedata may be taken over regions having dimensions of 5 microns by 5microns. In most preferred embodiments, the regions over whichflorescence data are taken across the substrate are less than about ½the area of the regions in which individual polymers are synthesized,preferably less than {fraction (1/10)} the area in which a singlepolymer is synthesized, and most preferably less than {fraction (1/100)}the area in which a single polymer is synthesized. Hence, within anyarea in which a given polymer has been synthesized, a large number offluorescence data points are collected.

[0156] A plot of number of pixels versus intensity for a scan shouldbear a rough resemblance to a bell curve, but spurious data areobserved, particularly at higher intensities.

[0157] Since it is desirable to use an average of fluorescent intensityover a given synthesis region in determining relative binding affinity,these spurious data will tend to undesirably skew the data.

[0158] Accordingly, in one embodiment of the invention the data arecorrected for removal of these spurious data points, and an average ofthe data points is thereafter utilized in determining relative bindingefficiency. In general the data are fitted to a base curve andstatistical measures are used to remove spurious data.

[0159] In an additional analytical tool, various degeneracy reducinganalogues may be incorporated in the hybridization probes. Variousaspects of this strategy are described, e.g., in Macevicz, S. (1990) PCTpublication number WO 90/04652, which is hereby incorporated herein byreference.

[0160] II. THEORETICAL ANALYSIS

[0161] The principle of the hybridization sequencing procedure is based,in part, upon the ability to determine overlaps of short segments. TheVLSIPS technology provides the ability to generate reagents which willsaturate the possible short subsequence recognition possibilities. Theprinciple is most easily illustrated by using a binary sequence, such asa sequence of zeros and ones. Once having illustrated the application toa binary alphabet, the principle may easily be understood to encompassthree letter, four letter, five or more letter, even 20 letteralphabets. A theoretical treatment of analysis of subsequenceinformation to reconstruction of a target sequence is provided, e.e., inLysov, Yu., et al. (1988) Dokladv Akademi. Nauk. SSR 303:1508-1511;Khrapko K., et al. (1989) FEBS Letters 256:118-122; Pevzner, P. (1989)J. of Biomolecular Structure and Dynamics 7:63-69; and Drmanac, R. etal. (1989) Genomics 4:114-128; each of which is hereby incorporatedherein by reference.

[0162] The reagents for recognizing the subsequences will usually bespecific for recognizing a particular polymer subsequence anywherewithin a target polymer. It is preferable that conditions may be devisedwhich allow absolute discrimination between high fidelity matching andvery low levels of mismatching. The reagent interaction will preferablyexhibit no sensitivity to flanking sequences, to the subsequenceposition within the target, or to any other remote structure within thesequence. For polynucleotide sequencing, the specific reagents can beoligonucleotide probes; for polypeptides and carbohydrates, antibodieswill be useful reagents. Antibody reagents should also be useful forother types of polymers.

[0163] A. Simple n-mer Structure: Theory

[0164] 1. Simple Two Letter Alphabet: Example

[0165] A simple example is presented below of how a sequence of tendigits comprising zeros and ones would be sequenceable using shortsegments of five digits. For example, consider the sample ten digitsequence:

[0166] 1010011100.

[0167] A VLSIPS™ Technology substrate could be constructed, as discussedelsewhere, which would have reagents attached in a defined matrixpattern which specifically recognize each of the possible five digitsequences of ones and zeros. The number of possible five digitsubsequences is 2=32. The number of possible different sequences 10digits long is 2¹⁰=1,024. The five contiguous digit subsequences withina ten digit sequence number six, i.e., positioned at digits 1-5, 2-6,3-7, 4-8, 5-9, and 6-10. It will be noted that the specific order of thedigits in the sequence is important and that the order is directional,e.g., running left to right versus right to left.

[0168] The first five digit sequence contained in the target sequence is10100. The second is 01001, the third is 10011, the fourth is 00111, thefifth is 01110, and the sixth is 11100.

[0169] The VLSIPS™ substrate would have a matrix pattern of positionallyattached reagents which recognize each of the different 5-mersubsequences. Those reagents which recognize each of the 6 contained5-mers will bind the target, and a label allows the positionaldetermination of where the sequence specific interaction has occurred.By correlation of the position in the matrix pattern, the correspondingbound subsequences can be determined.

[0170] In the above-mentioned sequence, six different 5-mer sequenceswould be determined to be present. They would be: 10100   01001  10011   00111   01110   11100

[0171] Any sequence which contains the first five digit sequence, 10100,already narrows the number of possible sequences (e.g., from 1024possible sequences) which contain it to less than about 192 possiblesequences.

[0172] This 192 is derived from the observation that with thesubsequence 10100 at the far left of the sequence, in positions 1-5,there are only 32 possible sequences. Likewise, for that particularsubsequence in positions 2-6, 3-7, 4-8, 5-9, and 6-10. So, to sum up allof the sequences that could contain 10100, there are 32 for eachposition and 6 positions for a total of about 192 possible sequences.However, some of these 10 digit sequences will have been counted twice.Thus, by virtue of containing the 10100 subsequence, the number ofpossible 10-mer sequences has been decreased from 1024 sequences to lessthan about 192 sequences.

[0173] In this example, not only do we know that the sequence contains10100, but we also know that it contains the second five charactersequence, 01001. By virtue of knowing that the sequence contains 10100,we can look specifically to determine whether the sequence contains asubsequence of five characters which contains the four leftmost digitsplus a next digit to the left. For example, we would look for a sequenceof X1010, but we find that there is none. Thus, we know that the 10100must be at the left end of the 10-mer. We would also look to see whetherthe sequence contains the rightmost four digits plus a next digit to theright, e.g., 0100X. We find that the sequence also contains the sequence01001, and that X is a 1. Thus, we know at least that our targetsequence has an overlap of 0100 and has the left terminal sequence101001.

[0174] Applying the same procedure to the second 5-mer, we also knowthat the sequence must include a sequence of five digits having thesequence 1001Y where Y must be either 0 or 1.

[0175] We look through the fragments and we see that we have a 10011sequence within our target, thus Y is also 1. Thus, we would know thatour sequence has a sequence of the first seven being 1010011.

[0176] Moving to the next 5-mer, we know that there must be a sequenceof 0011Z, where Z must be either 0 or 1. We look at the fragmentsproduced above and see that the target sequence contains a 00111subsequence and Z is 1. Thus, we know the sequence must start with10100111.

[0177] The next 5-mer must be of the sequence 0111W where W must be 0or 1. Again, looking up at the fragments produced, we see that thetarget sequence contains a 01110 subsequence, and W is a 0. Thus, oursequence to this point is 101001110. We know that the last 5-mer must beeither 11100 or 11101. Looking above, we see that it is 11100 and thatmust be the last of our sequence. Thus, we have determined that oursequence must have been 1010011100.

[0178] However, it will be recognized from the example above with thesequences provided therein, that the sequence analysis can start withany known positive probe subsequence. The determination may be performedby moving linearly along the sequence checking the known sequence with alimited number of next positions. Given this possibility, the sequencemay be determined, besides by scanning all possible oligonucleotideprobe positions, by specifically looking only where the next possiblepositions would be. This may increase the complexity of the scanning butmay provide a longer time span dedicated towards scanning and detectingspecific positions of interest relative to other sequence possibilities.Thus, the scanning apparatus could be set up to work its way along asequence from a given contained oligonucleotide to only look at thosepositions on the substrate which are expected to have a positive signal.

[0179] It is seen that given a sequence, it can be de-constructed inton-mers to produce a set of internal contiguous subsequences. From anygiven target sequence, we would be able to determine what fragmentswould result. The hybridization sequence method depends, in part, uponbeing able to work in the reverse, from a set of fragments of knownsequences to the full sequence. In simple cases, one is able to start ata single position and work in either or both directions towards the endsof the sequence as illustrated in the example.

[0180] The number of possible sequences of a given length increases veryquickly with the length of that sequence. Thus, a 10-mer of zeros andones has 1024 possibilities, a 12-mer has 4096. A 20-mer has over amillion possibilities, and a 30-mer has over a billion. However, a given30-mer has, at most, 26 different internal 5-mer sequences. Thus, a 30character target sequence having over a million possible sequences canbe substantially defined by only 26 different 5-mers. It will berecognized that the probe oligonucleotides will preferably, but need notnecessarily, be of identical length, and that the probe sequences neednot necessarily be contiguous in that the overlapping subsequences neednot differ by only a single subunit. Moreover, each position of thematrix pattern need not be homogeneous, but may actually contain aplurality of probes of known sequence. In addition, although all of thepossible subsequence specifications would be preferred, a less than fullset of sequences specifications could be used. In particular, although asubstantial fraction will preferably be at least about 70%, it may beless than that. About 20% would be preferred, more preferably at leastabout 30% would be desired. Higher percentages would be especiallypreferred.

[0181] 2. Example of Four Letter Alphabet

[0182] A four letter alphabet may be conceptualized in at least twodifferent ways from the two letter alphabet. One way is to consider thefour possible values at each position and to analogize in a similarfashion to the binary example each of the overlaps. A second way is togroup the binary digits into groups.

[0183] Using the first means, the overlap comparisons are performed witha four letter alphabet rather than a two letter alphabet. Then, incontrast to the binary system with 10 positions where 2¹⁰=1024 possiblesequences, in a 4-character alphabet with 10 positions, there willactually be 4¹⁰=1,048,576 possible sequences. Thus, the complexity of afour character sequence has a much larger number of possible sequencescompared to a two character sequence. Note, however, that there arestill only 6 different internal 5-mers. For simplicity, we shall examinea 5 character string with 3 character subsequences. Instead of only 1and 0, the characters may be designated, e.g., A, C, G, and T. Let ustake the sequence GGCTA. The 3-mer subsequences are: GGC  GCT   CTA

[0184] Given these subsequences, there is one sequence, or at most onlya few sequences which would produce that combination of subsequences,i.e., GGCTA.

[0185] Alternatively, with a four character universe, the binary systemcan be looked at in pairs of digits. The pairs would be 00, 01, 10, and11. In this manner, the earlier used sequence 1010011100 is looked at as10,10,01,11,00. Then the first character of two digits is selected fromthe possible universe of the four representations 00, 01, 10, and 11.Then a probe would be in an even number of digits, e.g., not fivedigits, but, three pairs of digits or six digits. A similar comparisonis performed and the possible overlaps determined. The 3-pairsubsequences are: $\begin{matrix}{10,} & {10,} & 01 & \quad & \quad \\\quad & {10,} & {01,} & 11 & \quad \\\quad & \quad & {01,} & {11,} & {00\quad}\end{matrix}\quad$

[0186] and the overlap reconstruction produces 10,10,01,11,00.

[0187] The latter of the two conceptual views of the 4 letter alphabetprovides a representation which is similar to what would be provided ina digital computer. The applicability to a four nucleotide alphabet iseasily seen by assigning, e.g., 00 to A, 01 to C, 10 to G, and 11 to T.And, in fact, if such a correspondence is used, both examples for the 4character sequences can be seen to represent the same target sequence.The applicability of the hybridization method and its analysis fordetermining the ultimate sequence is easily seen if A is therepresentation of adenine, C is the representation of cytosine, G is therepresentation of guanine, and T is the representation of thymine oruracil.

[0188] 3. Generalization to m-Letter Alphabet

[0189] This reconstruction process may be applied to polymers ofvirtually any number of possible characters in the alphabet, and forvirtually any length sequence to be sequenced, though limitations, asdiscussed below, will limit its efficiency at various extremes oflength. It will be recognized that the theory can be applied to a largediversity of systems where sequence is important.

[0190] For example, the method could be applied to sequencing of apolypeptide. A polypeptide can have any of twenty natural amino acidpossibilities at each position. A twenty letter alphabet is amenable tosequencing by this method so long as reagents exist for recognizingshorter subsequences therein. A preferred reagent for achieving thatgoal would be a set of monoclonal antibodies each of which recognizes aspecific three contiguous amino acid subsequence. A complete set ofantibodies which recognize all possible subsequences of a given length,e.g., 3 amino acids, and preferably with a uniform affinity, would be20³=8000 reagents.

[0191] It will also be recognized that each target sequence which isrecognized by the specific reagents need not have homogeneous termini.Thus, fragments of the entire target sequence will also be useful forhybridizing appropriate subsequences. It is, however, preferable thatthere not be a significant amount of labeled homogeneous contaminatingextraneous sequences. This constraint does usually require thepurification of the target molecule to be sequenced, but a specificlabel technique would dispense with a purification requirement if theunlabeled extraneous sequences do not interfere with the labeledsequences.

[0192] In addition, conformational effects of target polypeptide foldingmay, in certain embodiments, be negligible if the polypeptide isfragmented into sufficiently small peptides, or if the interaction isperformed under conditions where conformation, but not specificinteraction, is disrupted.

[0193] B. Complications

[0194] Two obvious complications exist with the method of sequenceanalysis by hybridization. The first results from a probe ofinappropriate length while the second relates to internally repeatedsequences.

[0195] The first obvious complication is a problem which arises from aninappropriate length of recognition sequence, which causes problems withthe specificity of recognition. For example, if the recognized sequenceis too short, every sequence which is utilized will be recognized byevery probe sequence. This occurs, e.g., in a binary system where theprobes are each of sequences which occur relatively frequently, e.g., atwo character probe for the binary system. Each possible two characterprobe would be expected to appear ¼ of the time in every single twocharacter position. Thus, the above sequence example would be recognizedby each of the 00, 10, 01, and 11. Thus, the sequence information isvirtually lost because the resolution is too low and each recognitionreagent specifically binds at multiple sites on the target sequence.

[0196] The number of different probes which bind to a target depends onthe relationship between the probe length and the target length. At theextreme of short probe length, the just mentioned problem exists ofexcessive redundancy and lack of resolution. The lack of stability inrecognition will also be a problem with extremely short probes. At theextreme of long probe length, each entire probe sequence is on adifferent position of a substrate. However, a problem arises from thenumber of possible sequences, which goes up dramatically with the lengthof the sequence. Also, the specificity of recognition begins to decreaseas the contribution to binding by any particular subunit may becomesufficiently low that the system fails to distinguish the fidelity ofrecognition. Mismatched hybridization may be a problem with thepolynucleotide sequencing applications, though the fingerprinting andmapping applications may not be so strict in their fidelityrequirements. As indicated above, a thirty position binary sequence hasover a million possible sequences, a number which starts to becomeunreasonably large in its required number of different sequences, eventhough the target length is still very short. Preparing a substrate withall sequence possibilities for a long target may be extremely difficultdue to the many different oligomers which must be synthesized.

[0197] The above example illustrates how a long target sequence may bereconstructed with a reasonably small number of shorter subsequences.Since the present day resolution of the regions of the substrate havingdefined oligomer probes attached to the substrate approaches about 10microns by 10 microns for resolvable regions, about 10⁶, or 1 million,positions can be placed on a one centimeter square substrate. However,high resolution systems may have particular disadvantages which may beoutweighed using the lower density substrate matrix pattern. For thisreason, a sufficiently large number of probe sequences can be utilizedso that any given target sequence may be determined by hybridization toa relatively small number of probes.

[0198] A second complication relates to convergence of sequences to asingle subsequence. This will occur when a particular subsequence isrepeated in the target sequence. This problem can be addressed in atleast two different ways. The first, and simpler way, is to separate therepeat sequences onto two different targets. Thus, each single targetwill not have the repeated sequence and can be analyzed to its end. Thissolution, however, complicates the analysis by requiring that some meansfor cutting at a site between the repeats can be located. Typically acareful sequencer would want to have two intermediate cut points so thatthe intermediate region can also be sequenced in both directions acrosseach of the cut points. This problem is inherent in the hybridizationmethod for sequencing but can be minimized by using a longer known probesequence so that the frequency of probe repeats is decreased.

[0199] Knowing the sequence of flanking sequences of the repeat willsimplify the use of polymerase chain reaction (PCR) or a similartechnique to further definitively determine the sequence betweensequence repeats. Probes can be made to hybridize to those knownsequences adjacent the repeat sequences, thereby producing new targetsequences for analysis. See, e.g., Innis et al. (eds.) (1990) PCRProtocols: A Guide to Methods and Applications, Academic Press; andmethods for synthesis of oligonucleotide probes, see, e.g., Gait (1984)Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford.

[0200] Other means for dealing with convergence problems include usingparticular longer probes, and using degeneracy reducing analogues, see,e.g., Macevicz, S. (1990) PCT publication number WO 90/04652, which ishereby incorporated herein by reference. By use of stretches of thedegeneracy reducing analogues with other probes in particularcombinations, the number of probes necessary to fully saturate thepossible oligomer probes is decreased. For example, with a stretch of12-mers having the central 4-mer of degenerate nucleotides, incombination with all of the possible 8-mers, the collection numberstwice the number of possible 8-mers, e.g. 65,536+65,536=131,072, but thepopulation provides screening equivalent to all possible 12-mers.

[0201] By way of further explanation, all possible oligonucleotide8-mers may be depicted in the fashion:

[0202] N1-N2-N3-N4-N5-N6-N7-N8,

[0203] in which there are 4⁸=65,536 possible 8-mers. As described inSer. No. 07/624,120, now abandoned, producing all possible 8-mersrequires 4×8=32 chemical binary synthesis steps to produce the entirematrix pattern of 65,536 8-mer possibilities. By incorporatingdegeneracy reducing nucleotides, D's, which hybridize nonselectively toany corresponding complementary nucleotide, new oligonucleotides 12-merscan be made in the fashion:

[0204] N1-N2-N3-N4-D-D-D-D-N5-N6-N7-N8,

[0205] in which there are again, as above, only 4⁸=65,536 possible“12-mers”, which in reality only have 8 different nucleotides.

[0206] However, it can be seen that each possible 12-mer probe could berepresented by a group of the two 8-mer types. Moreover, repeats of lessthan 12 nucleotides would not converge, or cause repeat problems in theanalysis. Thus, instead of requiring a collection of probescorresponding to all 12-mers, or 4¹²=16,777,216 different 12-mers, thesame information can be derived by making 2 sets of “8-mers” consistingof the typical 8-mer collection of 4⁸=65,536 and the “12-mer” set withthe degeneracy reducing analogues, also requiring making 4⁸=65,536. Thecombination of the two sets, requires making 65,536+65,536=131,072different molecules, but giving the information of 16,777,216 molecules.Thus, incorporating the degeneracy reducing analogue decreases thenumber of molecules necessary to get 12-mer resolution by a factor ofabout 128-fold.

[0207] C. Non-polynucleotide Embodiments

[0208] The above example is directed towards a polynucleotideembodiment. This application is relatively easily achieved because thespecific reagents will typically be complementary oligonucleotides,although in certain embodiments other specific reagents may be desired.For example, there may be circumstances where other than complementarybase pairing will be utilized. The polynucleotide targets, will usuallybe single strand, but may be double or triple stranded in variousapplications. However, a triple stranded specific interaction might besometimes desired, or a protein or other specific binding molecule maybe utilized. For example, various promoter or DNA sequence specificbinding proteins might be used, including, e.g., restriction enzymebinding domains, other binding domains, and antibodies. Thus, specificrecognition reagents besides oligonucleotides may be utilized.

[0209] For other polymer targets, the specific reagents will often bepolypeptides. These polypeptides may be protein binding domains fromenzymes or other proteins which display specificity for binding. Usuallyan antibody molecule may be used, and monoclonal antibodies may beparticularly desired. Classical methods may be applied for preparingantibodies, see, e.g., Harlow and Lane (1988) Antibodies: A LaboratoryManual Cold Spring Harbor Press, New York; and Goding (1986) MonoclonalAntibodies: Principles and Practice (2d Ed.) Academic Press, San Diego.Other suitable techniques for in vitro exposure of lymphocytes to theantigens or selection of libraries of antibody binding sites aredescribed, e.g., in Huse et al. (1989) Science 246:1275-1281; and Wardet al. 91989) Nature 341:544-546, each of which is hereby incorporatedherein by reference. Unusual antibody production methods are alsodescribed, e.g., in Hendricks et al. (1989) BioTechnology, 7:1271-1274;and Hiatt et al. (1989) Nature 342:76-78, each of which is herebyincorporated herein by reference. Other molecules which may exhibitspecific binding interaction may be useful for attachment to a VLSIPSsubstrate by various methods, including the caged biotin methods, see,e.g., Ser. No. 07/435,316, now abandoned, and Barrett et al. (1993) U.S.Pat. No. 5,252,743.

[0210] The antibody specific reagents should be particularly useful forthe polypeptide, carbohydrate, and synthetic polymer applications.Individual specific reagents might be generated by an automated processto generate the number of reagents necessary to advantageously use thehigh density positional matrix pattern. In an alternative approach, aplurality of hybridoma cells may be screened for their ability to bindto a VLSIPS matrix possessing the desired sequences whose bindingspecificity is desired. Each cell might be individually grown up and itsbinding specificity determined by VLSIPS apparatus and technology. Analternative strategy would be to expose the same VLSIPS matrix to apolyclonal serum of high titer. By a successively large volume of serumand different animals, each region of the VLSIPS substrate would haveattached to it a substantial number of antibody molecules withspecificity of binding. The substrate, with non-covalently boundantibodies could be derivatized and the antibodies transferred to anadjacent second substrate in the matrix pattern in which the antibodymolecules had attached to the first matrix. If the sensitivity ofdetection of binding interaction is sufficiently high, such a lowefficiency transfer of antibody molecules may produce a sufficientlyhigh signal to be useful for many purposes, including the sequencingapplications.

[0211] In another embodiment, capillary forces may be used to transferthe selected reagents to a new matrix, to which the reagents would bepositionally attached in the pattern of the recognized sequences. Or,the reagents could be transversely electrophoresed, magneticallytransferred, or otherwise transported to a new substrate in theirretained positional pattern.

[0212] III. POLYNUCLEOTIDE SEQUENCING

[0213] In principle, the making of a substrate having a positionallydefined matrix pattern of all possible oligonucleotides of a givenlength involves a conceptually simple method of synthesizing each andevery different possible oligonucleotide, and affixing them to adefinable position. Oligonucleotide synthesis is presently mechanizedand enabled by current technology, see, e.g., Ser. No. 07/362,901, nowabandoned; Pirrung et al. (1992) U.S. Pat. No. 5,143,854; andinstruments supplied by Applied Biosystems, Foster City, California.

[0214] A. Preparation of Substrate Matrix

[0215] The production of the collection of specific oligonucleotidesused in polynucleotide sequencing may be produced in at least twodifferent ways. Present technology certainly allows production of tennucleotide oligomers on a solid phase or other synthesizing system. See,e.g., instrumentation provided by Applied Biosystems, Foster City,California. Although a single oligonucleotide can be relatively easilymade, a large collection of them would typically require a fairly largeamount of time and investment. For example, there are 4¹⁰=1,048,576possible ten nucleotide oligomers. Present technology allows making eachand every one of them in a separate purified form though such might becostly and laborious.

[0216] Once the desired repertoire of possible oligomer sequences of agiven length have been synthesized, this collection of reagents may beindividually positionally attached to a substrate, thereby allowing abatchwise hybridization step. Present technology also would allow thepossibility of attaching each and every one of these 10-mers to aseparate specific position on a solid matrix. This attachment could beautomated in any of a number of ways, particularly through the use of acaged biotin type linking. This would produce a matrix having each ofdifferent possible 10-mers.

[0217] A batchwise hybridization is much preferred because of itsreproducibility and simplicity. An automated process of attachingvarious reagents to positionally defined sites on a substrate isprovided in Pirrung et al. (1992) U.S. Pat. No. 5,143,854; Ser. No.07/624,120, now abandoned; and Barrett et al. (1993) U.S. Pat. No.5,252,743; each of which is hereby incorporated herein by reference.

[0218] Instead of separate synthesis of each oligonucleotide, theseoligonucleotides are conveniently synthesized in parallel by sequentialsynthetic processes on a defined matrix pattern as provided in Pirrunget al. (1992) U.S. Pat. No. 5,143,854; and Ser. No. 07/624,120, nowabandoned, which are incorporated herein by reference. Here, theoligonucleotides are synthesized stepwise on a substrate at positionallyseparate and defined positions. Use of photosensitive blocking reagentsallows for defined sequences of synthetic steps over the surface of amatrix pattern. By use of the binary masking strategy, the surface ofthe substrate can be positioned to generate a desired pattern ofregions, each having a defined sequence oligonucleotide synthesized andimmobilized thereto.

[0219] Although the prior art technology can be used to generate thedesired repertoire of oligonucleotide probes, an efficient and costeffective means would be to use the VLSIPS technology described inPirrung et al. (1992) U.S. Pat. No. 5,143,854 and Ser. No. 07/624,120,now abandoned. In this embodiment, the photosensitive reagents involvedin the production of such a matrix are described below.

[0220] The regions for synthesis may be very small, usually less thanabout 100 μm×100 μm, more usually less than about 50 μm×50 μm. Thephotolithography technology allows synthetic regions of less than about10 μm×10 μm, about 3 μm×3 μm, or less. The detection also may detectsuch sized regions, though larger areas are more easily and reliablymeasured.

[0221] At a size of about 30 microns by 30 microns, one million regionswould take about 11 centimeters square or a single wafer of about 4centimeters by 4 centimeters. Thus the present technology provides formaking a single matrix of that size having all one million plus possibleoligonucleotides. Region size is sufficiently small to correspond todensities of at least about 5 regions/cm², 20 regions/cm², 50 regions/cm100 regions/cm², and greater, including 300 regions/cm², 1000regions/cm², 3K regions/cm², 10K regions/cm², 30K regions/cm², 100Kregions/cm², 300K regions/cm² or more, even in excess of one millionregions/cm².

[0222] Although the pattern of the regions which contain specificsequences is theoretically not important, for practical reasons certainpatterns will be preferred in synthesizing the oligonucleotides. Theapplication of binary masking algorithms for generating the pattern ofknown oligonucleotide probes is described in related Ser. No.07/624,120, now abandoned, which was filed simultaneously with thisapplication. By use of these binary masks, a highly efficient means isprovided for producing the substrate with the desired matrix pattern ofdifferent sequences. Although the binary masking strategy allows for thesynthesis of all lengths of polymers, the strategy may be easilymodified to provide only polymers of a given length. This is achieved byomitting steps where a subunit is not attached.

[0223] The strategy for generating a specific pattern may take any of anumber of different approaches. These approaches are well described inrelated application Ser. No. 07/624,120, now abandoned, and include anumber of binary masking approaches which will not be exhaustivelydiscussed herein. However, the binary masking and binary synthesisapproaches provide a maximum of diversity with a minimum number ofactual synthetic steps.

[0224] The length of oligonucleotides used in sequencing applicationswill be selected on criteria determined to some extent by the practicallimits discussed above. For example, if probes are made asoligonucleotides, there will be 65,536 possible eight nucleotidesequences. If a nine subunit oligonucleotide is selected, there are262,144 possible permeations of sequences. If a ten-mer oligonucleotideis selected, there are 1,048,576 possible permeations of sequences. Asthe number gets larger, the required number of positionally definedsubunits necessary to saturate the possibilities also increases. Withrespect to hybridization conditions, the length of the matchingnecessary to confer stability of the conditions selected can becompensated for. See, e.g., Kanehisa, M. (1984) Nuc. Acids Res.12:203-213, which is hereby incorporated herein by reference.

[0225] Although not described in detail here, but below foroligonucleotide probes, the VLSIPS technology would typically use aphotosensitive protective group on an oligonucleotide. Sampleoligonucleotides are shown in FIG. 1. In particular, the photoprotectivegroup on the nucleotide molecules may be selected from a wide variety ofpositive light reactive groups preferably including nitro aromaticcompounds such as o-nitro-benzyl derivatives or benzylsulfonyl. See,e.g., Gait (1984) Oligonucleotide Synthesis: A Practical Approach, IRLPress, Oxford, which is hereby incorporated herein by reference. In apreferred embodiment, 6-nitro-veratryl oxycarbony (NVOC), 2-nitrobenzyloxycarbonyl (NBOC), or α,α-dimethyl-dimethoxybenzyl oxycarbonyl (DEZ) isused. Photoremovable protective groups are described in, e.g.,Patchornik (1970) J. Amer. Chem. Soc. 92:6333-6335; and Amit et al.(1974) J. Organic Chem. 39:192-196; each of which is hereby incorporatedherein by reference.

[0226] A preferred linker for attaching the oligonucleotide to a siliconmatrix is illustrated in FIG. 2. A more detailed description is providedbelow. A photosensitive blocked nucleotide may be attached to specificlocations of unblocked prior cycles of attachments on the substrate andcan be successively built up to the correct length oligonucleotideprobe.

[0227] It should be noted that multiple substrates may be simultaneouslyexposed to a single target sequence where each substrate is a duplicateof one another or where, in combination, multiple substrates togetherprovide the complete or desired subset of possible subsequences. Thisprovides the opportunity to overcome a limitation of the density ofpositions on a single substrate by using multiple substrates. In theextreme case, each probe might be attached to a single bead or substrateand the beads sorted by whether there is a binding interaction. Thosebeads which do bind might be encoded to indicate the subsequencespecificity of reagents attached thereto.

[0228] Then, the target may be bound to the whole collection of beadsand those beads that have appropriate specific reagents on them willbind to the target. Then a sorting system may be utilized to sort thosebeads that actually bind the target from those that do not. This may beaccomplished by presently available cell sorting devices or a similarapparatus. After the relatively small number of beads which have boundthe target have been collected, the encoding scheme may be read off todetermine the specificity of the reagent on the bead. An encoding systemmay include a magnetic system, a shape encoding system, a color encodingsystem, or a combination of any of these, or any other encoding system.Once again, with the collection of specific interactions that haveoccurred, the binding may be analyzed for sequence information,fingerprint information, or mapping information.

[0229] The parameters of polynucleotide sizes of both the probes andtarget sequences are determined by the applications and othercircumstances. The length of the oligonucleotide probes used will dependin part upon the limitations of the VLSIPS technology to provide thenumber of desired probes. For example, in an absolute sequencingapplication, it is often useful to have virtually all of the possibleoligonucleotides of a given length. As indicated above, there are 65,5368-mers, 262,144 9-mers, 1,048,576 10-mers, 4,194,304 11-mers, etc. Asthe length of the oligomer increases the number of different probeswhich must be synthesized also increases at a rate of a factor of 4 forevery additional nucleotide. Eventually the size of the matrix and thelimitations in the resolution of regions in the matrix will reach thepoint where an increase in number of probes becomes disadvantageous.However, this sequencing procedure requires that the system be able todistinguish, by appropriate selection of hybridization and washingconditions, between binding of absolute fidelity and binding ofcomplementary sequences containing mismatches. On the other hand, if thefidelity is unnecessary, this discrimination is also unnecessary and asignificantly longer probe may be used. Significantly longer probeswould typically be useful in fingerprinting or mapping applications.

[0230] The length of the probe is selected for a length that will allowthe probe to bind with specificity to possible targets. Thehybridization conditions are also very important in that they willdetermine how closely the homology of complementary binding will bedetected. In fact, a single target may be evaluated at a number ofdifferent conditions to determine its spectrum of specificity forbinding particular probes. This may find use in a number of otherapplications besides the polynucleotide sequencing fingerprinting ormapping. For example, it will be desired to determine the spectrum ofbinding affinities and specificities of cell surface antigens withbinding by particular antibodies immobilized on the substrate surface,particularly under different interaction conditions. In a relatedfashion, different regions with reagents having differing affinities orlevels of specificity may allow such a spectrum to be defined using asingle incubation, where yarious regions, at a given hybridizationcondition, show the binding affinity. For example, fingerprint probes ofvarious lengths, or with specific defined non-matches may be used.Unnatural nucleotides or nucleotides exhibiting modified specificity ofcomplementary binding are described in greater detail in Macevicz (1990)PCT pub. No. WO 90/04652; and see the section on modified nucleotides inthe Sigma Chemical Company catalogue.

[0231] B. Labeling Target Nucleotide

[0232] The label used to detect the target sequences will be determined,in part, by the detection methods being applied. Thus, the labelingmethod and label used are selected in combination with the actualdetecting systems being used.

[0233] Once a particular label has been selected, appropriate labelingprotocols will be applied, as described below for specific embodiments.Standard labeling protocols for nucleic acids are described, e.g., inSambrook et al.; Kambara, H. et al. (1988) BioTechnology 6:816-821;Smith, L. et al. (1985) Nuc. Acids Res. 13:2399-2412; for polypeptides,see, e.g., Allen G. (1989) Sequencing of Proteins and Peptides,Elsevier, N.Y., especially chapter 5, and Greenstein and Winitz (1961)Chemistry of the Amino Acids, Wiley and Sons, New York. Carbohydratelabeling is described, e.g., in Chaplin and Kennedy (1986) CarbohydrateAnalysis: A Practical Approach, IRL Press, Oxford. Labeling of otherpolymers will be performed by methods applicable to them as recognizedby a person having ordinary skill in manipulating the correspondingpolymer.

[0234] In some embodiments, the target need not actually be labeled if ameans for detecting where interaction takes place is available. Asdescribed below, for a nucleic acid embodiment, such may be provided byan intercalating dye which intercalates only into double strandedsegments, e.g., where interaction occurs. See, e.g., Sheldon et al. U.S.Pat. No. 4,582,789.

[0235] In many uses, the target sequence will be absolutely homogeneous,both with respect to the total sequence and with respect to the ends ofeach molecule. Homogeneity with respect to sequence is important toavoid ambiguity. It is preferable that the target sequences of interestnot be contaminated with a significant amount of labeled contaminatingsequences. The extent of allowable contamination will depend on thesensitivity of the detection system and the inherent signal to noise ofthe system. Homogeneous contamination sequences will be particularlydisruptive of the sequencing procedure.

[0236] However, although the target polynucleotide must have a uniquesequence, the target molecules need not have identical ends. In fact,the homogeneous target molecule preparation may be randomly sheared toincrease the numerical number of molecules. Since the total informationcontent remains the same, the shearing results only in a higher numberof distinct sequences which may be labeled and bind to the probe. Thisfragmentation may give a vastly superior signal relative to apreparation of the target molecules having homogeneous ends. The signalfor the hybridization is likely to be dependent on the numericalfrequency of the target-probe interactions. If a sequence isindividually found on a larger number of separate molecules a bettersignal will result. In fact, shearing a homogeneous preparation of thetarget may often be preferred before the labeling procedure isperformed, thereby producing a large number of labeling groupsassociated with each subsequence.

[0237] C. Hybridization Conditions

[0238] The hybridization conditions between probe and target should beselected such that the specific recognition interaction, i.e.,hybridization, of the two molecules is both sufficiently specific andsufficiently stable. See, e.g., Hames and Higgins (1985) Nucleic AcidHybridisation: A Practical Approach, IRL Press, Oxford. These conditionswill be dependent both on the specific sequence and often on the guanineand cytosine (GC) content of the complementary hybrid strands. Theconditions may often be selected to be universally equally stableindependent of the specific sequences involved. This typically will makeuse of a reagent such as an alkylammonium buffer. See, Wood et al.(1985) “Base Composition-independent Hybridization inTetramethylammonium Chloride: A Method for Oligonucleotide Screening ofHighly Complex Gene Libraries,” Proc. Natl. Acad. Sci. USA,82:1585-1588; and Krupov et al. (1989) “An Oligonucleotide HybridizationApproach to DNA Sequencing,” FEBS Letters, 256:118-122; each of which ishereby incorporated herein by reference. An alkylammonium buffer tendsto minimize differences in hybridization rate and stability due to GCcontent. By virtue of the fact that sequences then hybridize withapproximately equal affinity and stability, there is relatively littlebias in strength or kinetics of binding for particular sequences.Temperature and salt conditions along with other buffer parametersshould be selected such that the kinetics of renaturation should beessentially independent of the specific target subsequence oroligonucleotide probe involved. In order to ensure this, thehybridization reactions will usually be performed in a single incubationof all the substrate matrices together exposed to the identical sametarget probe solution under the same conditions.

[0239] Alternatively, various substrates may be individually treateddifferently. Different substrates may be produced, each having reagentswhich bind to target subsequences with substantially identicalstabilities and kinetics of hybridization. For example, all of the highGC content probes could be synthesized on a single substrate which istreated accordingly. In this embodiment, the arylammonium buffers couldbe unnecessary. Each substrate is then treated in a manner such that thecollection of substrates show essentially uniform binding and thehybridization data of target binding to the individual substrate matrixis combined with the data from other substrates to derive the necessarysubsequence binding information. The hybridization conditions willusually be selected to be sufficiently specific such that the fidelityof base matching will be properly discriminated. Of course, controlhybridizations should be included to determine the stringency andkinetics of hybridization.

[0240] D. Detection; VLSIPS™ Technology Scanning

[0241] The next step of the sequencing process by hybridization involveslabeling of target polynucleotide molecules. A quickly and easilydetectable signal is preferred. The VLSIPS™ Technology apparatus isdesigned to easily detect a fluorescent label, so fluorescent tagging ofthe target sequence is preferred. Other suitable labels include heavymetal labels, magnetic probes, chromogenic labels (e.g., phosphorescentlabels, dyes, and fluorophores) spectroscopic labels, enzyme linkedlabels, radioactive labels, and labeled binding proteins. Additionallabels are described in U.S. Pat. No. 4,366,241, which is incorporatedherein by reference.

[0242] The detection methods used to determine where hybridization hastaken place will typically depend upon the label selected above. Thus,for a fluorescent label a fluorescent detection step will typically beused. Pirrung et al. (1992) U.S. Pat. No. 5,143,854 and Ser. No.07/624,120, now abandoned, describe apparatus and mechanisms forscanning a substrate matrix using fluorescence detection, but a similarapparatus is adaptable for other optically detectable labels.

[0243] The detection method provides a positional localization of theregion where hybridization has taken place. However, the position iscorrelated with the specific sequence of the probe since the probe hasspecifically been attached or synthesized at a defined substrate matrixposition. Having collected all of the data indicating the subsequencespresent in the target sequence, this data may be aligned by overlap toreconstruct the entire sequence of the target, as illustrated above.

[0244] It is also possible to dispense with actual labeling if somemeans for detecting the positions of interaction between the sequencespecific reagent and the target molecule are available. This may takethe form of an additional reagent which can indicate the sites either ofinteraction, or the sites of lack of interaction, e.g., a negativelabel. For the nucleic acid embodiments, locations of double strandinteraction may be detected by the incorporation of intercalating dyes,or other reagents such as antibody or other reagents that recognizehelix formation, see, e.g., Sheldon, et al. (1986) U.S. Pat. No.4,582,789, which is hereby incorporated herein by reference.

[0245] E. Analysis

[0246] Although the reconstruction can be performed manually asillustrated above, a computer program will typically be used to performthe overlap analysis. A program may be written and run on any of a largenumber of different computer hardware systems. The variety of operatingsystems and languages useable will be recognized by a computer softwareengineer. Various different languages may be used, e.g., BASIC; C;PASCAL; etc. A simple flow chart of data analysis is illustrated in FIG.1.

[0247] F. Substrate Reuse

[0248] Finally, after a particular sequence has been hybridized and thepattern of hybridization analyzed, the matrix substrate should bereusable and readily prepared for exposure to a second or subsequenttarget polynucleotides. In order to do so, the hybrid duplexes aredisrupted and the matrix treated in a way which removes all traces ofthe original target. The matrix may be treated with various detergentsor solvents to which the substrate, the oligonucleotide probes, and thelinkages to the substrate are inert. This treatment may include anelevated temperature treatment, treatment with organic or inorganicsolvents, modifications in pH, and other means for disrupting specificinteraction. Thereafter, a second target may actually be applied to therecycled matrix and analyzed as before.

[0249] G. Non-Polynucleotide Aspects

[0250] Although the sequencing, fingerprinting, and mapping functionswill make use of the natural sequence recognition property ofcomplementary nucleotide sequences, the non-polynucleotide sequencestypically require other sequence recognition reagents. These reagentswill take the form, typically, of proteins exhibiting bindingspecificity, e.g., enzyme binding sites or antibody binding sites.

[0251] Enzyme binding sites may be derived from promoter proteins,restriction enzymes, and the like. See, e.g., Stryer, L. (1988)Biochemistry, W.H.Freeman, Palo Alto. Antibodies will typically beproduced using standard procedures, see, e.g., Harlow and Lane (1988)Antibodies: A Laboratory Manual, Cold Spring Harbor Press, New York; andGoding (1986) Monoclonal Antibodies: Principles and Practice, (2d Ed.)Academic Press, San Diego.

[0252] Typically, an antigen, or collection of antigens are presented toan immune system. This may take the form of synthesized short polymersproduced by the VLSIPS technology, or by the other synthetic means, orfrom isolation of natural products. For example, antigen for thepolypeptides may be made by the VLSIPS technology, by standard peptidesynthesis, by isolation of natural proteins with or without degradationto shorter segments, or by expression of a collection of short nucleicacids of random or defined sequences. See, e.g., Tuerk and Gold (1990)Science 249:505-510, for generation of a collection of randomlymutagenized oligonucleotides useful for expression.

[0253] The antigen or collection is presented to an appropriate immunesystem, e.g., to a whole animal as in a standard immunization protocol,or to a collection of immune cells or equivalent. In particular, seeWard et al. (1989) Nature 341:544-546; and Huse et al. (1989) Science246:1275-1281, each of which is hereby incorporated herein by reference.

[0254] A large diversity of antibodies will be generated, some of whichhave specificities for the desired sequences. Antibodies may be purifiedhaving the desired sequence specificities by isolating the cellsproducing them. For example, a VLSIPS substrate with the desiredantigens synthesized thereon may be used to isolate cells with cellsurface reagents which recognize the antigens. The VLSIPS substrate maybe used as an affinity reagent to select and recover the appropriatecells. Antibodies from those cells may be attached to a substrate usingthe caged biotin methodology, or by attaching a targeting molecule,e.g., an oligonucleotide. Alternatively, the supernatants from antibodyproducing cells can be easily assayed using a VLSIPS substrate toidentify the cells producing the appropriate antibodies.

[0255] Although cells may be isolated, specific antibody molecules whichperform the sequence recognition will also be sufficient. Preferablypopulations of antibody with a known specificity can be isolated.Supernatants from a large population of producing cells may be passedover a VLSIPS substrate to bind to the desired antigens attached to thesubstrate. When a sufficient density of antibody molecules are attached,they may be removed by an automated process, preferably as antibodypopulations exhibiting specificity of binding.

[0256] In one particular embodiment, a VLSIPS substrate, e.g., with alarge plurality of fingerprint antigens attached thereto, is used toisolate antibodies from a supernatant of a population of cells producingantibodies to the antigens. Using the substrate as an affinity reagent,the antibodies will attach to the appropriate positionally definedantigens. The antibodies may be carefully removed therefrom, preferablyby an automated system which retains their homogeneous specificities.The isolated antibodies can be attached to a new substrate in apositionally defined matrix pattern.

[0257] In a further embodiment, these spatially separated antibodies maybe isolated using a specific targeting method for isolation. In thisembodiment, a linker molecule which attaches to a particular portion ofthe antibody, preferably away from the binding site, can be attached tothe antibodies. Various reagents will be used, including staphylococcusprotein A or antibodies which bind to domains remote from the bindingsite. Alternatively, the antibodies in the population, before affinitypurification, may be derivatized with an appropriate reagent compatiblewith new VLSIPS synthesis. A preferred reagent is a nucleotide which canserve as a linker to synthetic VLSIPS steps for synthesizing a specificsequence thereon. Then, by successive VLSIPS cycles, each of theantibodies attached to the defined antigen regions can have a definedoligonucleotide synthesized thereon and corresponding in area to theregion of the substrate having each antigen attached. These definedoligonucleotides will be useful as targeting reagents to attach thoseantibodies possessing the same target sequence specificity at definedpositions on a new substrate, by virtue of having bound to the antigenregion, to a new VLSIPS substrate having the complementary targetoligonucleotides positionally located on it. In this fashion, a VLSIPSsubstrate having the desired antigens attached thereto can be used togenerate a second VLSIPS substrate with positionally defined reagentswhich recognize those antigens.

[0258] The selected antigens will typically be selected to be thosewhich define particular functionalities or properties, so as to beuseful for fingerprinting and other uses. They will also be useful formapping and sequencing embodiments.

[0259] IV. Fingerprinting

[0260] A. General

[0261] Many of the procedures and techniques used in the polynucleotidesequencing section are also appropriate for fingerprinting applications.See, e.g., Poustka, et al. (1986) Cold Spring Harbor Symposia on Quant.Biol., vol. LI, 131-139, Cold Spring Harbor Press, New York; which ishereby incorporated herein by reference. The fingerprinting methodprovided herein is based, in part, upon the ability to positionallylocalize a large number of different specific probes onto a singlesubstrate. This high density matrix pattern provides the ability toscreen for, or detect, a very large number of different sequencessimultaneously. In fact, depending upon the hybridization conditions,fingerprinting to the resolution of virtually absolute matching ofsequence is possible thereby approaching an absolute sequencingembodiment. And the sequencing embodiment is very useful in identifyingthe probes useful in further fingerprinting uses. For example,characteristic features of genetic sequences will be identified as beingdiagnostic of the entire sequence. However, in most embodiments, longerprobe and target will be used, and for which slight mismatching may notneed to be resolved.

[0262] B. Preparation of Substrate Matrix

[0263] A collection of specific probes may be produced by either of themethods described above in the section on sequencing. Specificoligonucleotide probes of desired lengths may be individuallysynthesized on a standard oligonucleotide synthesizer. The length ofthese probes is limited only by the ability of the synthesizer tocontinue to accurately synthesize a molecule. Oligonucleotides orsequence fragments may also be isolated from natural sources. Biologicalamplification methods may be coupled with synthetic synthesizingprocedures such as, e.g., polymerase chain reaction.

[0264] In one embodiment, the individually isolated probes may beattached to the matrix at defined positions. These probe reagents may beattached by an automated process making use of the caged biotinmethodology described in Ser. No. 07/612,671, or using photochemicalreagents, see, e.g., Dattagupta et al. (1985) U.S. Pat. No. 4,542,102and (1987) U.S. Pat. No. 4,713,326. Each individually purified reagentcan be attached individually at specific locations on a substrate.

[0265] In another embodiment, the VLSIPS synthesizing technique may beused to synthesize the desired probes at specific positions on asubstrate. The probes may be synthesized by successively addingappropriate monomer subunits, e.g., nucleotides, to generate the desiredsequences.

[0266] In another embodiment, a relatively short specificoligonucleotide is used which serves as a targeting reagent forpositionally directing the sequence recognition reagent. For example,the sequence specific reagents having a separate additional sequencerecognition segment (usually of a different polymer from the targetsequence) can be directed to target oligonucleotides attached to thesubstrate. By use of non-natural targeting reagents, e.g., unusualnucleotide analogues which pair with other unnatural nucleotideanalogues and which do not interfere with natural nucleotideinteractions, the natural and non-natural portions can coexist on thesame molecule without interfering with their individual functionalities.This can combine both a synthetic and biological production systemanalogous to the technique for targeting monoclonal antibodies tolocations on a VLSIPS substrate at defined positions. Unnatural opticalisomers of nucleotides may be useful unnatural reagents subject tosimilar chemistry, but incapable of interfering with the naturalbiological polymers. See also, Ser. No. 07/626,730, which is herebyincorporated herein by reference.

[0267] After the separate substrate attached reagents are attached tothe targeting segment, the two are crosslinked, thereby permanentlyattaching them to the substrate. Suitable crosslinking reagents areknown, see, e.g., Dattagupta et al. (1985) U.S. Pat. No. 4,542,102 and(1987) “Coupling of nucleic acids to solid support by photochemicalmethods,” U.S. Pat. No. 4,713,326, each of which is hereby incorporatedherein by reference. Similar linkages for attachment of proteins to asolid substrate are provided, e.g., in Merrifield (1986) Science232:341-347, which is hereby incorporated herein by reference.

[0268] C. Labeling Target Nucleotides

[0269] The labeling procedures used in the sequencing embodiments willalso be applicable in the fingerprinting embodiments. However, since thefingerprinting embodiments often will involve relatively large targetmolecules and relatively short oligonucleotide probes, the amount ofsignal necessary to incorporate into the target sequence may be lesscritical than in the sequencing applications. For example, a relativelylong target with a relatively small number of labels per molecule may beeasily amplified or detected because of the relatively large targetmolecule size.

[0270] In various embodiments, it may be desired to cleave the targetinto smaller segments as in the sequencing embodiments. The labelingprocedures and cleavage techniques described in the sequencingembodiments would usually also be applicable here.

[0271] D. Hybridization Conditions

[0272] The hybridization conditions used in fingerprinting embodimentswill typically be less critical than for the sequencing embodiments. Thereason is that the amount of mismatching which may be useful inproviding the fingerprinting information would typically be far greaterthan that necessary in sequencing uses. For example, Southernhybridizations do not typically distinguish between slightly mismatchedsequences. Under these circumstances, important and valuable informationmay be arrived at with less stringent hybridization conditions whileproviding valuable fingerprinting information. However, since the entiresubstrate is typically exposed to the target molecule at one time, thebinding affinity of the probes should usually be of approximatelycomparable levels. For this reason, if oligonucleotide probes are beingused, their lengths should be approximately comparable and will beselected to hybridize under conditions which are common for most of theprobes on the substrate. Much as in a Southern hybridization, the targetand oligonucleotide probes are of lengths typically greater than about25 nucleotides. Under appropriate hybridization conditions, e.g.,typically higher salt and lower temperature, the probes will hybridizeirrespective of imperfect complementarity. In fact, with probes ofgreater than, e.g., about fifty nucleotides, the difference in stabilityof different sized probes will be relatively minor.

[0273] Typically the fingerprinting is merely for probing similarity orhomology. Thus, the stringency of hybridization can usually be decreasedto fairly low levels. See, e.g., Wetmur and Davidson (1968) “Kinetics ofRenaturation of DNA,” J. Mol. Biol., 31:349-370; and Kanehisa, M. (1984)Nuc. Acids Res., 12:203-213.

[0274] E. Detection; VLSIPS™ Technology Scanning

[0275] Detection methods will be selected which are appropriate for theselected label. The scanning device need not necessarily be digitized orplaced into a specific digital database, though such would most likelybe done. For example, the analysis in fingerprinting could bephotographic. Where a standardized fingerprint substrate matrix is used,the pattern of hybridizations may be spatially unique and may becompared photographically. In this manner, each sample may have acharacteristic pattern of interactions and the likelihood of identicalpatterns will preferably be such low frequency that the fingerprintpattern indeed becomes a characteristic pattern virtually as unique asan individual's fingertip fingerprint. With a standardized substrate,every individual could be, in theory, uniquely identifiable on the basisof the pattern of hybridizing to the substrate.

[0276] Of course, the VLSIPS™ Technology scanning apparatus may also beuseful to generate a digitized version of the fingerprint pattern. Inthis way, the identification pattern can be provided in a linear stringof digits. This sequence could also be used for a standardizedidentification system providing significant useful medicaltransferability of specific data. In one embodiment, the probes used areselected to be of sufficiently high resolution to measure the antigensof the major histo compatibility complex. It might even be possible toprovide transplantation matching data in a linear stream of data. Thefingerprinting data may provide a condensed version, or summary, of thelinear genetic data, or any other information data base.

[0277] F. Analysis

[0278] The analysis of the fingerprint will often be much simpler than atotal sequence determination. However, there may be particular types ofanalysis which will be substantially simplified by a selected group ofprobes. For example, probes which exhibit particular populationalheterogeneity may be selected. In this way, analysis may be simplifiedand practical utility enhanced merely by careful selection of thespecific probes and a careful matrix layout of those probes.

[0279] G. Substrate Reuse

[0280] As with the sequencing application, the fingerprinting usages mayalso take advantage of the reusability of the substrate. In this way,the interactions can be disrupted, the substrate treated, and therenewed substrate is equivalent to an unused substrate.

[0281] H. Non-Polynucleotide Aspects

[0282] Besides polynucleotide applications, the fingerprinting analysismay be applied to other polymers, especially polypeptides,carbohydrates, and other polymers, both organic and inorganic. Besidesusing the fingerprinting method for analyzing a particular polymer, thefingerprinting method may be used to characterize various samples. Forexample, a cell or population of cells may be tested for theirexpression of specific antigens or their mRNA sequence intent. Forexample, a T-cell may be classified by virtue of its combination ofexpressed surface antigens. With specific reagents which interact withthese antigens, a cell or a population of cells or a lysed cell may beexposed to a VLSIPS substrate. The biological sample may be classifiedor characterized by analyzing the pattern of specific interaction. Thismay be applicable to a cell or tissue type, to the messenger RNApopulation expressed by a cell to the genetic content of a cell, or tovirtually any sample which can be classified and/or identified by itscombination of specific molecular properties.

[0283] The ability to generate a high density means for screening thepresence or absence of specific interactions allows for the possibilityof screening for, if not saturating, all of a very large number ofpossible interactions. This is very powerful in providing the means fortesting the combinations of molecular properties which can define aclass of samples. For example, a species of organism may becharacterized by its DNA sequences, e.g., a genetic fingerprint. Byusing a fingerprinting method, it may be determined that all members ofthat species are sufficiently similar in specific sequences that theycan be easily identified as being within a particular group. Thus, newlydefined classes may be resolved by their similarity in fingerprintpatterns. Alternatively, a non-member of that group will fail to sharethose many identifying characteristics. However, since the technologyallows testing of a very large number of specific interactions, it alsoprovides the ability to more finely distinguish between closely relateddifferent cells or samples. This will have important applications indiagnosing viral, bacterial, and other pathological on nonpathologicalinfections.

[0284] In particular, cell classification may be defined by any of anumber of different properties. For example, a cell class may be definedby its DNA sequences contained therein. This allows speciesidentification for parasitic or other infections. For example, the humancell is presumably genetically distinguishable from a monkey cell, butdifferent human cells will share many genetic markers. At higherresolution, each individual human genome will exhibit unique sequencesthat can define it as a single individual.

[0285] Likewise, a developmental stage of a cell type may be definableby its pattern of expression of messenger RNA. For example, inparticular stages of cells, high levels of ribosomal RNA are foundwhereas relatively low levels of other types of messenger RNAs may befound. The high resolution distinguishability provided by thisfingerprinting method allows the distinction between cells which haverelatively minor differences in its expressed mRNA population. Where apattern is shown to be characteristic of a stage, a stage may be definedby that particular pattern of messenger RNA expression.

[0286] In a similar manner, the antigenic determinants found on aprotein may very well define the cell class. For example, immunologicalT-cells are distinguishable from B-cells because, in part, the cellsurface antigens on the cell types are distinguishable. Different T-cellsubclasses can be also distinguished from one another by whether theycontain particular T-cell antigens. The present invention provides thepossibility for high resolution testing of many different interactionssimultaneously, and the definition of new cell types will be possible.

[0287] The high resolution VLSIPS™ substrate may also be used as a verypowerful diagnostic tool to test the combination of presence, of aplurality of different assays from a biological sample. For example, acancerous condition may be indicated by a combination of variousdifferent properties found in the blood. For example, a cancerouscondition may be indicated by a combination of expression of varioussoluble antigens found in the blood along with a high number of variouscellular antigens found on lymphocytes and/or particular celldegradation products. With a substrate as provided herein, a largenumber of different features can be simultaneously performed on abiological sample. In fact, the high resolution of the test will allowmore complete characterization of parameters which define particulardiseases. Thus, the power of diagnostic tests may be limited by theextent of statistical correlation with a particular condition ratherthan with the number of antigens or interactions which are tested. Thepresent invention provides the means to generate this large universe ofpossible reagents and the ability to actually accumulate thatcorrelative data.

[0288] In another embodiment, a substrate as provided herein may be usedfor genetic screening. This would allow for simultaneous screening ofthousands of genetic markers. As the density of the matrix is increased,many more molecules can be simultaneously tested. Genetic screening thenbecomes a simpler method as the present invention provides the abilityto screen for thousands, tens of thousands, and hundreds of thousands,even millions of different possible genetic features. However, thenumber of high correlation genetic markers for conditions numbers onlyin the hundreds. Again, the possibility for screening a large number ofsequences provides the opportunity for generating the data which canprovide correlation between sequences and specific conditions orsusceptibility. The present invention provides the means to generateextremely valuable correlations useful for the genetic detection of thecausative mutation leading to medical conditions. In still anotherembodiment, the present invention would be applicable to distinguishingtwo individuals having identical genetic compositions. The antibodypopulation within an individual is dependent both on genetic andhistorical factors. Each individual experiences a unique exposure tovarious infectious agents, and the combined antibody expression ispartly determined thereby. Thus, individuals may also be fingerprintedby their immunological content, either of actively expressed antibodies,or their immunological memory. Similar sorts of immunological andenvironmental histories may be useful for fingerprinting, perhaps incombination with other screening properties. In particular, the presentinvention may be useful for screening allergic reactions orsusceptibilities, and a simple IgE specificity test may be useful indetermining a spectrum of allergies.

[0289] With the definition of new classes of cells, a cell sorter willbe used to purify them. Moreover, new markers for defining that class ofcells will be identified. For example, where the class is defined by itsRNA content, cells may be screened by antisense probes which detect thepresence or absence of specific sequences therein. Alternatively, celllysates may provide information useful in correlating intracellularproperties with extracellular markers which indicate functionaldifferences. Using standard cell sorter technology with a fluorescenceor labeled antisense probe which recognizes the internal presence of thespecific sequences of interest, the cell sorter will be able to isolatea relatively homogeneous population of cells possessing the particularmarker. Using successive probes the sorting process should be able toselect for cells having a combination of a large number of differentmarkers.

[0290] In a non-polynucleotide embodiment, cells may be defined by thepresence of other markers. The markers may be carbohydrates, proteins,or other molecules. Thus, a substrate having particular specificreagents, e.g., antibodies, attached to it should be able to identifycells having particular patterns of marker expression. Of course,combinations of these made be utilized and a cell class may be definedby a combination of its expressed mRNA, its carbohydrate expression, itsantigens, and other properties. This fingerprinting should be useful indetermining the physiological state of a cell or population of cells.

[0291] Having defined a cell type whose function or properties aredefined by the reagents attachable to a VLSIPS substrate, such ascellular antigens, these structural manifestations of function may beused to sort cells to generate a relatively homogeneous population ofthat class of cells. Standard cell sorter technology may be applied topurify such a population, see, e.g., Dangl, J. and Herzenberg (1982)“Selection of hybridomas and hybridoma variants using the fluorescenceactivated cell sorter,” J. Immunolocical Methods 52:1-14; and BectonDickinson, Fluorescence Activated Cell Sorter Division, San Jose,California, and Coulter Diagnostics, Hialeah, Fla.

[0292] With the fingerprinting method an identification means arisesfrom mosaicism problems in an organism. A mosaic organism is one whosegenetic content in different cells is significantly different. Variousclonal populations should have similar genetic fingerprints, thoughdifferent clonal populations may have different genetic contents. See,for example, Suzuki et al. An Introduction to Genetic Analysis (4thEd.), Freeman and Co., New York, which is hereby incorporated herein byreference. However, this problem should be a relatively rare problem andcould be more carefully evaluated with greater experience using thefingerprinting methods.

[0293] The invention will also find use in detecting changes, bothgenetic and antigenic, e.g., in a rapidly “evolving” protozoa infection,or similarly changing organism.

[0294] V. Mapping

[0295] A. General

[0296] The use of the present invention for mapping parallels its usefor fingerprinting and sequencing. Where a polymer is a linear molecule,the mapping provides the ability to locate particular segments along thelength of the polymer. Branched polymers can be treated as a series ofindividual linear polymers. The mapping provides the ability to locate,in a relative sense, the order of various subsequences. This may beachieved using at least two different approaches.

[0297] The first approach is to take the large sequence and fragment itat specific points. The fragments are then ordered and attached to asolid substrate. For example, the clones resulting from a chromosomewalking process may be individually attached to the substrate bymethods, e.g., caged biotin techniques, indicated earlier. Segments ofunknown map position will be exposed to the substrate and will hybridizeto the segment which contains that particular sequence. This procedureallows the rapid determination of a number of different labeledsegments, each mapping requiring only a single hybridization step oncethe substrate is generated. The substrate may be regenerated by removalof the interaction, and the next mapping segment applied.

[0298] In an alternative method, a plurality of subsequences can beattached to a substrate. Various short probes may be applied todetermine which segments may contain particular overlaps. Thetheoretical basis and a description of this mapping procedure iscontained in, e.g., Evans et al. 1989 “Physical Mapping of ComplexGenomes by Cosmid Multiplex Analysis,” Proc. Natl. Acad. Sci. USA86:5030-5034, and other references cited above in the Section labeled“Overall Description.” Using this approach, the details of the mappingembodiment are very similar to those used in the fingerprintingembodiment.

[0299] B. Preparation of Substrate Matrix

[0300] The substrate may be generated in either of the methods generallyapplicable in the sequencing and fingerprinting embodiments. Thesubstrate may be made either synthetically, or by attaching otherwisepurified probes or sequences to the matrix. The probes or sequences maybe derived either from synthetic or biological means. As indicatedabove, the solid phase substrate synthetic methods may be utilized togenerate a matrix with positionally defined sequences. In the mappingembodiment, the importance of saturation of all possible subsequences ofa preselected length is far less important than in the sequencingembodiment, but the length of the probes used may be desired to be muchlonger. The processes for making a substrate which has longeroligonucleotide probes should not be significantly different from thosedescribed for the sequencing embodiments, but the optimizationparameters may be modified to comply with the mapping needs.

[0301] C. Labeling

[0302] The labeling methods will be similar to those applicable insequencing and fingerprinting embodiments. Again, it may be desirable tofragment the target sequences.

[0303] D. Hybridization/Specific Interaction

[0304] The specificity of interaction between the targets and probewould typically be closer to those used for fingerprinting embodiments,where homology is more important than absolute distinguishability ofhigh fidelity complementary hybridization. Usually, the hybridizationconditions will be such that merely homologous segments will interactand provide a positive signal. Much like the fingerprinting embodiment,it may be useful to measure the extent of homology by successiveincubations at higher stringency conditions. Or, a plurality ofdifferent probes, each having various levels of homology may be used. Ineither way, the spectrum of homologies can be measured.

[0305] Where non-nucleic acid hybridization is involved, the specificinteractions may also be compared in a fingerprint-like manner. Thespecific reagents may have less specificity, e.g., monoclonal antibodieswhich recognize a broader spectrum of sequences may be utilized relativeto a sequencing embodiment. Again, the specificity of interaction may bemeasured under various conditions of increasing stringency to determinethe spectrum of matching across the specific probes selected, or anumber of different stringency reagents may be included to indicate thebinding affinity.

[0306] E. Detection

[0307] The detection methods used in the mapping procedure will bevirtually identical to those used in the fingerprinting embodiment. Thedetection methods will be selected in combination with the labelingmethods.

[0308] F. Analysis

[0309] The analysis of the data in a mapping embodiment will typicallybe somewhat different from that in fingerprinting. The fingerprintingembodiment will test for the presence or absence of specific orhomologous segments. However, in the mapping embodiment, the existenceof an interaction is coupled with some indication of the location of theinteraction. The interaction is mapped in some manner to the physicalpolymer sequence. Some means for determining the relative positions ofdifferent probes is performed. This may be achieved by synthesis of thesubstrate in pattern, or may result from analysis of sequences afterthey have been attached to the substrate.

[0310] For example, the probes may be randomly positioned at variouslocations on the substrate. However, the relative positions of thevarious reagents in the original polymer may be determined by usingshort fragments, e.g., individually, as target molecules which determinethe proximity of different probes. By an automated system of testingeach different short fragment of the original polymer, coupled withproper analysis, it will be possible to determine which probes areadjacent one another on the original target sequence and correlate thatwith positions on the matrix. In this way, the matrix is useful fordetermining the relative locations of various new segments in theoriginal target molecule. This sort of analysis is described in Evans,and the related references described above.

[0311] G. Substrate Reuse

[0312] The substrate should be reusable in the manner described in thefingerprinting section. The substrate is renewed by removal of thespecific interactions and is washed and prepared for successive cyclesof exposure to new target sequences.

[0313] H. Non-polynucleotide Aspects The mapping procedure may be usedon other molecules than polynucleotides. Although hybridization is onetype of specific interaction which is clearly useful for use in thismapping embodiment, antibody reagents may also be very useful. In thesame way that polypeptide sequencing or other polymers may be sequencedby the reagents and techniques described in the sequencing section andfingerprinting section, the mapping embodiment may also be usedsimilarly.

[0314] In another form of mapping, as described above in thefingerprinting section, the developmental map of a cell or biologicalsystem may be measured using fingerprinting type technology. Thus, themapping may be along a temporal dimension rather than along a polymerdimension. The mapping or fingerprinting embodiments may also be used indetermining the genetic rearrangements which may be geneticallyimportant, as in lymphocyte and B-cell development. In another example,various rearrangements or chromosomal dislocations may be tested byeither the fingerprinting or mapping methods. These techniques aresimilar in many respects and the fingerprinting and mapping embodimentsmay overlap in many respects.

[0315] VI. Additional Screening and Applications

[0316] A. Specific Interactions

[0317] As originally indicated in the parent filing of VLSIPS™Technology, the production of a high density plurality of spatiallysegregated polymers provides the ability to generate a very largeuniverse or repertoire of individually and distinct sequencepossibilities. As indicated above, particular oligonucleotides may besynthesized in automated fashion at specific locations on a matrix. Infact, these oligonucleotides may be used to direct other molecules tospecific locations by linking specific oligonucleotides to otherreagents which are in batch exposed to the matrix and hybridized in acomplementary fashion to only those locations where the complementaryoligonucleotide has been synthesized on the matrix. This allows forspatially attaching a plurality of different reagents onto the matrixinstead of individually attaching each separate reagent at each specificlocation. Although the caged biotin method allows automated attachment,the speed of the caged biotin attachment process is relatively slow andrequires a separate reaction for each reagent being attached. By use ofthe oligonucleotide method, the specificity of position can be done inan automated and parallel fashion. As each reagent is produced, insteadof directly attaching each reagent at each desired position, the reagentmay be attached to a specific desired complementary oligonucleotidewhich will ultimately be specifically directed toward locations on thematrix having a complementary oligonucleotide attached thereat.

[0318] In addition, the technology allows screening for specificity ofinteraction with particular reagents. For example, the oligonucleotidesequence specificity of binding of a potential reagent may be tested bypresenting to the reagent all of the possible subsequences available forbinding. Although secondary or higher order sequence specific featuresmight not be easily screenable using this technology, it does provide aconvenient, simple, quick, and thorough screen of interactions between areagent and its target recognition sequences. See, e.g., Pfeifer et al.(1989) Science 246:810-812.

[0319] For example, the interaction of a promoter protein with itstarget binding sequence may be tested for many different, or all,possible binding sequences. By testing the strength of interactionsunder various different conditions, the interaction of the promoterprotein with each of the different potential binding sites may beanalyzed. The spectrum of strength of interactions with each differentpotential binding site may provide significant insight into the types offeatures which are important in determining specificity.

[0320] An additional example of a sequence specific interaction betweenreagents is the testing of binding of a double stranded nucleic acidstructure with a single stranded oligonucleotide. Often, a triplestranded structure is produced which has significant aspects of sequencespecificity. Testing of such interactions with either sequencescomprising only natural nucleotides, or perhaps the testing ofnucleotide analogs may be very important in screening-for particularlyuseful diagnostic or therapeutic reagents. See, e.g., Haner and Dervan(1990) Biochemistry 29:9761-6765, and references therein.

[0321] B. Sequence Comparisons

[0322] Once a gene is sequenced, the present invention provides a meansto compare alleles or related sequences to locate and identifydifferences from the control sequence. This would be extremely useful infurther analysis of genetic variability at a specific gene locus.

[0323] C. Categorizations

[0324] As indicated above in the fingerprinting and mapping embodiments,the present invention is also useful in defining specific stages in thetemporal sequence of cells, e.g., development, and the resulting tissueswithin an organism. For example, the developmental stage of a cell, orpopulation of cells, can be dependent upon the expression of particularmessenger RNAs or cellular antigens. The screening procedures providedallow for high resolution definition of new classes of cells. Inaddition, the temporal development of particular cells will becharacterized by the presence or expression of various mRNAs. Means tosimultaneously screen a plurality or very large number of differentsequences are provided. The combination of different markers madeavailable dramatically increases the ability to distinguish fairlyclosely related cell types. Other markers may be combined with markersand methods made available herein to define new classifications ofbiological samples, e.g., based upon new combinations of markers.

[0325] The presence or absence of particular marker sequences will beused to define temporal developmental stages. Once the stages aredefined, fairly simple methods can be applied to actually purify thoseparticular cells. For example, antisense probes or recognition reagentsmay be used with a cell sorter to select those cells containing orexpressing the critical markers. Alternatively, the expression of thosesequences may result in specific antigens which may also be used indefining cell classes and sorting those cells away from others. In thisway, for example, it should be possible to select a class of omnipotentimmune system cells which are able to completely regenerate a humanimmune system. Based upon the cellular classes defined by the parametersmade available by this technology, purified classes of cells havingidentifiable differences, structural or functional, are made available.

[0326] In an alternative embodiment, a plurality of antigens or specificbinding proteins attached to the substrate may be used to defineparticular cell types. For example, subclasses of T-cells are defined,in part, by the combination of expressed cell surface antigens. Thepresent invention allows for the simultaneous screening of a largeplurality of different antigens together. Thus, higher resolutionclassification of different T-cell subclasses becomes possible and, withthe definitions and functional differences which correlate with thoseantigenic or other parameters, the ability to purify those cell typesbecomes available. This is applicable not only to T-cells, but also tolymphocyte cells, or even to freely circulating cells. Many of the cellsfor which this would be most useful will be immobile cells found inparticular tissues or organs. Tumor cells will be diagnosed or detectedusing these fingerprinting techniques. Coupled with a temporal change instructure, developmental classes may also be selected and defined usingthese technologies. The present invention also provides the ability notonly to define new classes of cells based upon functional or structuraldifferences, but it also provides the ability to select or purifypopulations of cells which share these particular properties. Standardcell sorting procedures using antibody markers may be used to detectextracellular features. Intracellular features would also be detectableby introducing the label reagents into the cell. In particular,antisense DNA or RNA molecules may be introduced into a cell to detectRNA sequences therein. See, e.g., Weintraub (1990) Scientific American262:40-46.

[0327] D. Statistical Correlations

[0328] In an additional embodiment, the present invention also allowsfor the high resolution correlation of medical conditions with variousdifferent markers. For example, the presently available technology, whenapplied to amniocentesis or other genetic screening methods, typicallyscreens for tens of different markers at most. The present inventionallows simultaneous screening for tens, hundreds, thousands, tens ofthousands, hundreds of thousands, and even millions of different geneticsequences. Thus, applying the fingerprinting methods of the presentinvention to a sufficiently large population allows detailed statisticalanalysis to be made, thereby correlating particular medical conditionswith particular markers, typically antigenic or genetic. Tumor specificantigens will be identified using the present invention.

[0329] Various medical conditions may be correlated against an enormousdata base of the sequences within an individual. Genetic propensitiesand correlations then become available and high resolution geneticpredictability and correlation become much more easily performed. Withthe enormous data base, the reliability of the predictions is alsobetter tested. Particular markers which are partially diagnostic ofparticular medical conditions or medical susceptibilities will beidentified and provide direction in further studies and more carefulanalysis of the markers involved. Of course, as indicated above in thesequencing embodiment, the present invention will find much use inintense sequencing projects. For example, sequencing of the entire humangenome in the human genome project will be greatly simplified andenabled by the present invention.

[0330] VI. Formation of Substrate

[0331] The substrate is provided with a pattern of specific reagentswhich are positionally localized on the surface of the substrate. Thismatrix of positions is defined by the automated system which producesthe substrate. The instrument will typically be one similar to thatdescribed in Pirrung et al. (1992) U.S. Pat. No. 5,143,854, and Ser. No.07/624,120, now abandoned. The instrumentation described therein isdirectly applicable to the applications used here. In particular, theapparatus comprises a substrate, typically a silicon containingsubstrate, on which positions on the surface may be defined by acoordinate system of positions. These positions can be individuallyaddressed or detected by the VLSIPS™ Technology apparatus.

[0332] Typically, the VLSIPS™ Technology apparatus uses optical methodsused in semiconductor fabrication applications. In this way, masks maybe used to photo-activate positions for attachment or synthesis ofspecific sequences on the substrate. These manipulations may beautomated by the types of apparatus described in Pirrung et al. (1992)U.S. Pat. No. 5,143,854 and Ser. No. 07/624,120, now abandoned.

[0333] Selectively removable protecting groups allow creation of welldefined areas of substrate surface having differing reactivities.Preferably, the protecting groups are selectively removed from thesurface by applying a specific activator, such as electromagneticradiation of a specific wavelength and intensity. More preferably, thespecific activator exposes selected areas of surface to remove theprotecting groups in the exposed areas.

[0334] Protecting groups of the present invention are used inconjunction with solid phase oligomer syntheses, such as peptidesyntheses using natural or unnatural amino acids, nucleotide synthesesusing deoxyribonucleic and ribonucleic acids, oligosaccharide syntheses,and the like. In addition to protecting the substrate surface fromunwanted reaction, the protecting groups block a reactive end of themonomer to prevent self-polymerization. For instance, attachment of aprotecting group to the amino terminus of an activated amino acid, suchas the N-hydroxysuccinimide-activated ester of the amino acid preventsthe amino terminus of one monomer from reacting with the activated esterportion of another during peptide synthesis.

[0335] Alternatively, the protecting group may be attached to thecarboxyl group of an amino acid to prevent reaction at this site. Mostprotecting groups can be attached to either the amino or the carboxylgroup of an amino acid, and the nature of the chemical synthesis willdictate which reactive group will require a protecting group.Analogously, attachment of a protecting group to the 5′-hydroxyl groupof a nucleoside during synthesis using for example, phosphate-triestercoupling chemistry, prevents the 5′-hydroxyl of one nucleoside fromreacting with the 3′-activated phosphate-triester of another.

[0336] Regardless of the specific use, protecting groups are employed toprotect a moiety on a molecule from reacting with another reagent.Protecting groups of the present invention have the followingcharacteristics: they prevent selected reagents from modifying the groupto which they are attached; they are stable (that is, they remainattached) to the synthesis reaction conditions; they are removable underconditions that do not adversely affect the remaining structure; andonce removed, do not react appreciably with the surface or surface-boundoligomer. The selection of a suitable protecting group will depend, ofcourse, on the chemical nature of the monomer unit and oligomer, as wellas the specific reagents they are to protect against.

[0337] In a preferred embodiment, the protecting groups will bephotoactivatable. The properties and uses of photoreactive protectingcompounds have been reviewed. See, McCray et al., Ann. Rev. of Biophys.and Biophys. Chem. (1989) 18:239-270, which is incorporated herein byreference. Preferably, the photosensitive protecting groups will beremovable by radiation in the ultraviolet (UV) or visible portion of theelectromagnetic spectrum. More preferably, the protecting groups will beremovable by radiation in the near UV or visible portion of thespectrum. In some embodiments, however, activation may be performed byother methods such as localized heating, electron beam lithography,laser pumping, oxidation or reduction with microelectrodes, and thelike. Sulfonyl compounds are suitable reactive groups for electron beamlithography. Oxidative or reductive removal is accomplished by exposureof the protecting group to an electric current source, preferably usingmicroelectrodes directed to the predefined regions of the surface whichare desired for activation. A more detailed description of theseprotective groups is provided in Ser. No. 07/624,120, now abandoned,which is hereby incorporated herein by reference.

[0338] The density of reagents attached to a silicon substrate may bevaried by standard procedures. The surface area for attachment ofreagents may be increased by modifying the silicon surface. For example,a matte surface may be machined or etched on the substrate to providemore sites for attachment of the particular reagents. Another way toincrease the density of reagent binding sites is to increase thederivitization density of the silicon. Standard procedures for achievingthis are described, below.

[0339] One method to control the derivatization density is to highlyderivatize the substrate with photochemical groups at high density. Thesubstrate is then photolyzed for various predetermined times, whichphotoactivate the groups at a measurable rate, and react them with acapping reagent. By this method, the density of linker groups may bemodulated by using a desired time and intensity of photoactivation.

[0340] In many applications, the number of different sequences which maybe provided may be limited by the density and the size of the substrateon which the matrix pattern is generated. In situations where thedensity is insufficiently high to allow the screening of the desirednumber of sequences, multiple substrates may be used to increase thenumber of sequences tested. Thus, the number of sequences tested may beincreased by using a plurality of different substrates. Because theVLSIPS apparatus is almost fully automated, increasing the number ofsubstrates does not lead to a significant increase in the number ofmanipulations which must be performed by humans. This again leads togreater reproducibility and speed in the handling of these multiplesubstrates.

[0341] A. Instrumentation

[0342] The concept of using VLSIPS™ Technology generally allows apattern or a matrix of reagents to be generated. The procedure formaking the pattern is performed by any of a number of different methods.An apparatus and instrumentation useful for generating a high densityVLSIPS substrate is described in detail in Pirrung et al. (1992) U.S.Pat. No. 5,143,854 and Ser. No. 07/624,120, now abandoned.

[0343] B. Binary Masking

[0344] The details of the binary masking are described in anaccompanying application filed simultaneously with this, Ser. No.07/624,120, now abandoned, whose specification is incorporated herein byreference.

[0345] For example, the binary masking technique allows for producing aplurality of sequences based on the selection of either of twopossibilities at any particular location. By a series of binary maskingsteps, the binary decision may be the determination, on a particularsynthetic cycle, whether or not to add any particular one of thepossible subunits. By treating various regions of the matrix pattern inparallel, the binary masking strategy provides the ability to carry outspatially addressable parallel synthesis.

[0346] C. Synthetic Methods

[0347] The synthetic methods in making a substrate are described in theparent application, Pirrung et al. (1992) U.S. Pat. No. 5,143,854. Theconstruction of the matrix pattern on the substrate will typically begenerated by the use of photo-sensitive reagents. By use ofphoto-lithographic optical methods, particular segments of the substratecan be irradiated with light to activate or deactivate blocking agents,e.g., to protect or deprotect particular chemical groups. By anappropriate sequence of photo-exposure steps at appropriate times withappropriate masks and with appropriate reagents, the substrates can haveknown polymers synthesized at positionally defined regions on thesubstrate. Methods for synthesizing various substrates are described inPirrung et al. (1992) U.S. Pat. No. 5,143,854 and Ser. No. 07/624,120,now abandoned. By a sequential series of these photo-exposure andreaction manipulations, a defined matrix pattern of known sequences maybe generated, and is typically referred to as a VLSIPS™ Technologysubstrate. In the nucleic acid synthesis embodiment, nucleosides used inthe synthesis of DNA by photolytic methods will typically be one of thetwo forms shown below:

[0348] B=Adenine, Cytosine, Guanine, or Thymine

[0349] In I, the photolabile group at the 5′ position is abbreviated NV(nitroveratryl) and in II, the group is abbreviated NVOC (nitroveratryloxycarbonyl). Although not shown in FIG. C, the bases (adenine,cytosine, and guanine) contain exocyclic NH₂ groups which must beprotected during DNA synthesis. Thymine contains no exocyclic NH₂ andtherefore requires no protection. The standard protecting groups forthese amines are shown below:

[0350] Other amides of the general formula

[0351] where R may be alkyl or aryl have been used.

[0352] Another type of protecting group FMOC (9-fluorenylmethoxycarbonyl) is currently being used to protect the exocyclic aminesof the three bases:

[0353] Adenine (A) Cytosine (C) Guanine (G)

[0354] The advantage of the FMCC group is that it is removed under mildconditions (dilute organic bases) and can be used for all three bases.The amide protecting groups require more harsh conditions to be removed(NH₃/MeOH with heat).

[0355] Nucleosides used as 5′-OH probes, useful in verifying correctVLSIPS synthetic function, include, for example, the following:

[0356] These compounds are used to detect where on a substratephotolysis has occurred by the attachment of either III or V to thenewly generated 5′-OH. In the case of III, after the phosphateattachment is made, the substrate is treated with a dilute base toremove the FMOC group. The resulting amine can be reacted with FITC andthe substrate examined by fluorescence microscopy. This indicates theproper generation of a 5′-OH. In the case of compound IV, after thephosphate attachment is made, the substrate is treated with FITC labeledstreptavidin and the substrate again may be examined by fluorescencemicroscopy. Other probes, although not nucleoside based, have includedthe following:

[0357] The method of attachment of the first nucleoside to the surfaceof the substrate depends on the functionality of the groups at thesubstrate surface. If the surface is amine functionalized, an amide bondis made (see example below).

[0358] If the surface is hydroxy functionalized, a phosphate bond ismade (see example below)

[0359] In both cases, the thymidine example is illustrated, but any oneof the four phosphoramidite activated nucleosides can be used i n thefirst step.

[0360] Photolysis of the photolabile group NV or NVOC on the 5′positions of the nucleosides is carried out at ˜362 nm with an intensityof 14 mW/cm² for 10 minutes with the substrate side (side containing thephotolabile group) immersed in dioxane. After the coupling of the nextnucleoside is complete, the photolysis is repeated followed by anothercoupling until the desired oligomer is obtained.

[0361] One of the most common 3′-O-protecting groups is the ester, inparticular the acetate:

[0362] The groups can be removed by mild base treatment 0.1N NaOH/MeOHor K₂CO₃/H₂O/MeOH.

[0363] Another group used most often is the silyl ether:

[0364] These groups can be removed by neutral conditions using 1 Mtetra-n-butylammonium fluoride in THF or under acid conditions.

[0365] With respect to photodeprotection, the nitroveratryl group couldalso be used to protect the 3′-position.

[0366] Here, light (photolysis) would be used to remove these protectinggroups.

[0367] A variety of ethers can also be used in the protection of the3′-O-position:

[0368] Removal of these groups usually involves acid or catalyticmethods.

[0369] Note that corresponding linkages and photoblocked amino acids aredescribed in detail in Ser. No. 07/624,120, now abandoned, which ishereby incorporated herein by reference.

[0370] Although the specificity of interactions at particular locationswill usually be homogeneous due to a homogeneous polymer beingsynthesized at each defined location, for certain purposes, it may beuseful to have mixed polymers with a commensurate mixed collection ofinteractions occurring at specific defined locations, or degeneracyreducing analogues, which have been discussed above and show broadspecificity in binding. Then, a positive interaction signal may resultfrom any of a number of sequences contained therein.

[0371] As an alternative method of generating a matrix pattern on asubstrate, preformed polymers may be individually attached at particularsites on the substrate. This may be performed by individually attachingreagents one at a time to specific positions on the matrix, a processwhich may be automated. See, e.g., Ser. No. 07/435,316, now abandoned,and Barrett et al. (1993) U.S. Pat. No. 5,252,743. Another way ofgenerating a positionally defined matrix pattern on a substrate is tohave individually specific reagents which interact with each specificposition on the substrate. For example, oligonucleotides may besynthesized at defined locations on the substrate. Then the substratewould have on its surface a plurality of regions having homogeneousoligonucleotides attached at each position.

[0372] In particular, at least four different substrate preparationprocedures are available for treating a substrate surface. They are thestandard VLSIPS™ Technology method, polymeric substrates, Durapore™, andsynthetic beads or fibers. The treatment labeled “standard VLSIPS™Technology” method is described in Ser. No. 07/624,120, now abandoned,and involves applying amino-propyltriethoxysilane to a glass surface.

[0373] The polymeric substrate approach involves either of two ways ofgenerating a polymeric substrate. The first uses a high concentration ofaminopropyltriethoxysilane (2-20%) in an aqueous ethanol solution (95%).This allows the silane compound to polymerize both in solution and onthe substrate surface, which provides a high density of amines on thesurface of the glass. This density is contrasted with the standardVLSIPS method. This polymeric method allows for the deposition on thesubstrate surface of a monolayer due to the anhydrous method used withthe aforementioned silane.

[0374] The second polymeric method involves either the coating orcovalent binding of an appropriate acrylic acid polymer onto thesubstrate surface. In particular, e.g., in DNA synthesis, a monomer suchas a hydroxypropylacrylate is used to generate a high density ofhydroxyl groups on the substrate surface, allowing for the formation ofphosphate bonds. An example of such a compound is shown:

[0375] The method using a Durapore™ membrane (Millipore) consists of apolyvinylidine difluoride coating with crosslinked polyhydroxylpropylacrylate [PVDF-HPA]:

[0376] Here the building up of, e.g., a DNA oligomer, can be startedimmediately since phosphate bonds to the surface can be accomplished inthe first step with no need for modification. A nucleotide dimer(5′-C-T-3′) has been successfully made on this substrate.

[0377] The fourth method utilizes synthetic beads or fibers. This woulduse another substrate, such as a teflon copolymer graft bead or fiber,which is covalently coated with an organic layer (hydrophilic)terminating in hydroxyl sites (commercially available from MolecularBiosystems, Inc.) This would offer the same advantage as the Durapore™membrane, allowing for immediate phosphate linkages, but would giveadditional contour by the 3-dimensional growth of oligomers.

[0378] A matrix pattern of new reagents may be targeted to each specificoligonucleotide position by attaching a complementary oligonucleotide towhich the substrate bound form is complementary. For instance, a numberof regions may have homogeneous oligonucleotides synthesized at variouslocations. Oligonucleotide sequences complementary to each of these canbe individually generated and linked to a particular specific reagents.Often these specific reagents will be antibodies. As each of these isspecific for finding its complementary oligonucleotide, each of thespecific reagents will bind through the oligonucleotide to theappropriate matrix position. A single step having a combination ofdifferent specific reagents being attached specifically to a particularoligonucleotide will thereby bind to its complement at the definedmatrix position. The oligonucleotides will typically then be covalentlyattached, using, e.g., an acridine dye, for photocrosslinking. Psoralenis a commonly used acridine dye for photocrosslinking purposes, see,e.g., Song et al. (1979) Photochem. Photobiol. 29:1177-1197; Cimino etal. (1985) Ann. Rev. Biochem. 54:1151-1193; Parsons (.1980) Photochem.Photobiol. 32:813-821; and Dattagupta et al. (1985) U.S. Pat. No.4,542,102, and (1987) U.S. Pat. No. 4,713,326; each of which is herebyincorporated herein by reference. This method allows a single attachmentmanipulation to attach all of the specific reagents to the matrix atdefined positions and results in the specific reagents beinghomogeneously located at defined positions. In many embodiments, thespecific reagents will be antibodies.

[0379] In an alternative embodiment, antibody molecules may be used tospecifically direct binding to defined positions on a substrate. TheVLSIPS technology may be used to generate specific epitopes at eachposition on the substrate. Antibody molecules having specificity ofinteraction may be used to attach oligonucleotides, thereby avoiding theinterference of internal polynucleotide sequences from binding to thesubstrate complementary oligonucleotides. In fact, the specificity ofinteraction for positional targeting may be achieved by use ofnucleotide analogues which do not interact with the natural nucleotides.For example, other synthetic nucleotides have been made which undergobase pairing, thereby providing the specificity of targeting, but thesynthetic nucleotides also do not interact with the natural biologicalnucleotides. Thus, synthetic oligonucleotides would be useful forattachment to biological nucleotides and specific targeting. Moreover,the VLSIPS synthetic processes would be useful in generating the VLSIPSsubstrate, and standard oligonucleotide synthesis could be applied, withminor modifications, to produce the complementary sequences which wouldbe attached to other specific reagents.

[0380] D. Surface Immobilization

[0381] 1. Caged Biotin

[0382] An alternative method of attaching reagents in a positionallydefined matrix pattern is to use a caged biotin system. See Barrett etal. (1993) U.S. Pat. No. 5,252,743, which is hereby incorporated hereinby reference, for additional details on the chemistry and application ofcaged biotin embodiments. In short, the caged biotin has aphotosensitive blocking moiety which prevents the combination of avidinto biotin. At positions where the photo-lithographic process has removedthe blocking group, high affinity biotin sites are generated. Thus, by asequential series of photolithographic deblocking steps interspersedwith exposure of those regions to appropriate biotin containingreagents, only those locations where the deblocking takes place willform an avidin-biotin interaction. Because the avidin-biotin binding isvery tight, this will usually be virtually irreversible binding.

[0383] 2. Crosslinked Interactions

[0384] The surface immobilization may also take place by photocrosslinking of defined oligonucleotides linked to specific reagents.After hybridization of the complementary oligonucleotides, theoligonucleotides may be crosslinked by a reagent by psoralen or anothersimilar type of acridine dye. other useful cross linking reagents aredescribed in Dattagupta et al. (1985) U.S. Pat. No. 4,542,102, and(1987) U.S. Pat. No. 4,713,326.

[0385] In another embodiment, colony or phage plaque transfer ofbiological polymers may be transferred directly onto a siliconsubstrate. For example, a colony plate may be transferred onto asubstrate having a generic oligonucleotide sequence which hybridizes toanother generic complementary sequence contained on all of the vectorsinto which inserts are cloned. This will specifically only bind thosemolecules which are actually contained in the vectors containing thedesired complementary sequence. This immobilization allows for producinga matrix onto which a sequence specific reagent can bind, or for otherpurposes. In a further embodiment, a plurality of different vectors eachhaving a specific oligonucleotide attached to the vector may bespecifically attached to particular regions on a matrix having acomplementary oligonucleotide attached thereto.

[0386] VIII. Hybridization/Specific Interaction

[0387] A. General

[0388] As discussed previously in the VLSIPS™ Technology parentapplications, the VLSIPS™ technology substrates may be used forscreening for specific interactions with sequence specific targets orprobes.

[0389] In addition, the availability of substrates having the entirerepertoire of possible sequences of a defined length opens up thepossibility of sequencing by hybridization. This sequence may be de novodetermination of an unknown sequence, particularly of nucleic acid,verification of a sequence determined by another method, or aninvestigation of changes in a previously sequenced gene, locating andidentifying specific changes. For example, often Maxam and Gilbertsequencing techniques are applied to sequences which have beendetermined by Sanger and Coulson. Each of those sequencing technologieshave problems with resolving particular types of sequences. Sequencingby hybridization may serve as a third and independent method forverifying other sequencing techniques. See, e.g., (1988) Science242:1245.

[0390] In addition, the ability to provide a large repertoire ofparticular sequences allows use of short subsequences and hybridizationas a means to fingerprint a sample. This may be used in a nucleic acid,as well as other polymer embodiments. For example, fingerprinting to ahigh degree of specificity of sequence matching may be used foridentifying highly similar samples, e.g., those exhibiting high homologyto the selected probes. This may provide a means for determiningclassifications of particular sequences. This should allow determinationof whether particular genomes of bacteria, phage, or even higher cellsmight be related to one another.

[0391] In addition, fingerprinting may be used to identify an individualsource of biological sample. See, e.g., Lander, E. (1989) Nature,339:501-505, and references therein. For example, a DNA fingerprint maybe used to determine whether a genetic sample arose from anotherindividual. This would be particularly useful in various sorts offorensic tests to determine, e.g., paternity or sources of bloodsamples. Significant detail on the particulars of genetic fingerprintingfor identification purposes are described in, e.g., Morris et al. (1989)“Biostatistical evolution of evidence from continuous allele frequencydistribution DNA probes in reference to disputed paternity of identity,”J. Forensic Science 34:1311-1317; and Neufeld et al. (1990) ScientificAmerican 262:46-53; each of which is hereby incorporated herein byreference.

[0392] In another embodiment, a fingerprinting-like procedure may beused for classifying cell types by analyzing a pattern of specificnucleic acids present in the cell. A series of antibodies may be used toidentify cell markers, e.g., proteins, usually on the cell surface, butintracellular markers may also be used. Antigens which areextracellularly expressed are preferred so cell lysis is unnecessary inthe screening, but intracellular markers may also be useful. The markerswill usually be proteins, but may be nucleic acids, lipids, metabolites,carbohydrates, or other cellular components. See, e.g., Winkelgren, I.(1990) Science News 136:234-237, which indicates extracellular DNA maybe common, and suggesting that such might be characteristic of celltypes, stage, or physiology. This may also be useful in defining thetemporal stage of development of cells, e.g., stem cells or other cellswhich undergo temporal changes in development. For example, the stage ofa cell, or group of cells, may be tested or defined by isolating asample of MRNA from the population and testing to see what sequences arepresent in messenger populations. Direct samples, or amplified samples,may be used. Where particular mRNA or other nucleic acid sequences maybe characteristic of or shown to be characteristic of particulardevelopmental stages, physiological states, or other conditions, thisfingerprinting method may define them. Similar sorts of fingerprintingmay be used for determining T-cell classes or perhaps even to generateclassification schemes for such proteins as major histocompatibilitycomplex antigens. Thus, the ability to make these substrates allows boththe generation of reagents which will be used for defining subclasses orclasses of cells or other biological materials, but also provides themechanisms for selecting those cells which may be found in definedpopulation groups.

[0393] In addition to cell classification defined by such a combinationof properties, typically expression of extracellular antigens, thepresent invention also provides the means for isolating homogeneouspopulation of cells. Once the antigenic determinants which define a cellclass have been identified, these antigens may be used in a sequentialselection process to isolate only those cells which exhibit thecombination of defining structural properties.

[0394] The present invention may also be used for mapping sequenceswithin a larger segment. This may be performed by at least two methods,particularly in reference to nucleic acids. Often, enormous segments ofDNA are subcloned into a large plurality of subsequences. Ordering thesesubsequences may be important in determining the overlaps of sequencesupon nucleotide determinations. Mapping may be performed by immobilizingparticularly large segments onto a matrix using the VLSIPS™ Technology.Alternatively, sequences may be ordered by virtue of subsequences sharedby overlapping segments. See, e.g., Craig et al. (1990) Nuc. Acids Res.18:2653-2660; Michiels et al. (1987) CABIOS 3:203-210; and Olson et al.(1986) Proc. Natl. Acad. Sci. USA 83:7826-7830.

[0395] B. Important Parameters

[0396] The extent of specific interaction between reagents immobilizedto the VLSIPS™ Technology substrate and another sequence specificreagent may be modified by the conditions of the interaction. Sequencingembodiments typically require high fidelity hybridization and theability to discriminate perfect matching from imperfect matching.Fingerprinting and mapping embodiments may be performed using lessstringent conditions, depending upon the circumstances.

[0397] For example, the specificity of antibody/antigen interaction maydepend upon such parameters as pH, salt concentration, ioniccomposition, solvent composition, detergent composition andconcentration, and chaotropic agent concentration. See, e.g., Harlow andLane (1988) Antibodies: A Laboratory Manual, Cold Spring Harbor Press,New York. By careful control of these parameters, the affinity ofbinding may be mapped across different sequences.

[0398] In a nucleic acid hybridization embodiment, the specificity andkinetics of hybridization have been described in detail by, e.g., Wetmurand Davidson (1968) J. Mol. Biol., 31:349-370, Britten and Kohne (1968)Science 161:529-530, and Kanehisa, (1984) Nuc. Acids Res. 12:203-213,each of which is hereby incorporated herein by reference. Parameterswhich are well known to affect specificity and kinetics of reactioninclude salt conditions, ionic composition of the solvent, hybridizationtemperature, length of oligonucleotide matching sequences, guanine andcytosine (GC) content, presence of hybridization accelerators, pH,specific bases found in the matching sequences, solvent conditions, andaddition of organic solvents.

[0399] In particular, the salt conditions required for driving highlymismatched sequences to completion typically include a high saltconcentration. The typical salt used is sodium chloride (NaCl), however,other ionic salts may be utilized, e.g., KCl. Depending on the desiredstringency hybridization, the salt concentration will often be less thanabout 3 molar, more often less than 2.5 molar, usually less than about 2molar, and more usually less than about 1.5 molar. For applicationsdirected towards higher stringency matching, the salt concentrationswould typically be lower. Ordinary high stringency conditions willutilize salt concentration of less than about 1 molar, more often lessthen about 750 millimolar, usually less than about 500 millimolar, andmay be as low as about 250 or 150 millimolar.

[0400] The kinetics of hybridization and the stringency of hybridizationboth depend upon the temperature at which the hybridization is performedand the temperature at which the washing steps are performed.Temperatures at which steps for low stringency hybridization are desiredwould typically be lower temperatures, e.g., ordinarily at least about15° C., more ordinarily at least about 20° C., usually at least about25° C., and more usually at least about 30° C.. For those applicationsrequiring high stringency hybridization, or fidelity of hybridizationand sequence matching, temperatures at which hybridization and washingsteps are performed would typically be high. For example, temperaturesin excess of about 35° C. would often be used, more often in excess ofabout 40° C., usually at least about 45° C., and occasionally eventemperatures as high as about 50° C. or 60° C. or more. Of course, thehybridization of oligonucleotides may be disrupted by even highertemperatures. Thus, for stripping of targets from substrates, asdiscussed below, temperatures as high as 80° C., or even higher may beused.

[0401] The base composition of the specific oligonucleotides involved inhybridization affects the temperature of melting, and the stability ofhybridization as discussed in the above references. However, the bias ofGC rich sequences to hybridize faster and retain stability at highertemperatures can be compensated for by the inclusion in thehybridization incubation or wash steps of various buffers. Samplebuffers which accomplish this result include the triethly-and trimethylammonium buffers. See, e.g., Wood et al. (1987) Proc. Natl. Acad. Sci.USA, 82:1585-1588, and Khrapko, K. et al. (1989) FEBS Letters256:118-122.

[0402] The rate of hybridization can also be affected by the inclusionof particular hybridization accelerators. These hybridizationaccelerators include the volume exclusion agents characterized bydextran sulfate, or polyethylene glycol (PEG). Dextran sulfate istypically included at a concentration of between 1% and 40% by weight.The actual concentration selected depends upon the application, buttypically a faster hybridization is desired in which the concentrationis optimized for the system in question. Dextran sulfate is oftenincluded at a concentration of between 0.5% and 2% by weight or dextransulfate at a concentration between about 0.5% and 5%. Alternatively,proteins which accelerate hybridization may be added, e.g., the recAprotein found in E. coli or other homologous proteins.

[0403] With respect to those embodiments where specific reagents are notoligonucleotides, the conditions of specific interaction would depend onthe affinity of binding between the specific reagent and its target.Typically parameters which would be of particular importance would bepH, salt concentration anion and cation compositions, bufferconcentration, organic solvent inclusion, detergent concentration, andinclusion of such reagents such as chaotropic agents. In particular, theaffinity of binding may be tested over a variety of conditions bymultiple washes and repeat scans or by using reagents with differencesin binding affinity to determine which reagents bind or do not bindunder the selected binding and washing conditions. The spectrum ofbinding affinities may provide an additional dimension of informationwhich may be very useful in identification purposes and mapping.

[0404] Of course, the specific hybridization conditions will be selectedto correspond to a discriminatory condition which provides a positivesignal where desired but fails to show a positive signal at affinitieswhere interaction is not desired. This may be determined by a number oftitration steps or with a number of controls which will be run duringthe hybridization and/or washing steps to determine at what point thehybridization conditions have reached the stage of desired specificity.

[0405] IX. Detection Methods

[0406] Methods for detection depend upon the label selected. Thecriteria for selecting an appropriate label are discussed below,however, a fluorescent label is preferred because of its extremesensitivity and simplicity. Standard labeling procedures are used todetermine the positions where interactions between a sequence and areagent take place. For example, if a target sequence is labeled andexposed to a matrix of different probes, only those locations whereprobes do interact with the target will exhibit any signal.Alternatively, other methods may be used to scan the matrix to determinewhere interaction takes place. Of course, the spectrum of interactionsmay be determined in a temporal manner by repeated scans of interactionswhich occur at each of a multiplicity of conditions. However, instead oftesting each individual interaction separately, a multiplicity ofsequence interactions may be simultaneously determined on a matrix.

[0407] A. Labeling Technicues

[0408] The target polynucleotide may be labeled by any of a number ofconvenient detectable markers. A fluorescent label is preferred becauseit provides a very strong signal with low background. It is alsooptically detectable at high resolution and sensitivity through a quickscanning procedure. Other potential labeling moieties include,radioisotopes, chemiluminescent compounds, labeled binding proteins,heavy metal atoms, spectroscopic markers, magnetic labels, and linkedenzymes.

[0409] Another method for labeling may bypass any label of the targetsequence. The target may be exposed to the probes, and a double strandhybrid is formed at those positions only. Addition of a double strandspecific reagent will detect where hybridization takes place. Anintercalative dye such as ethidium bromide may be used as long as theprobes themselves do not fold back on themselves to a significant extentforming hairpin loops. See, e.g., Sheldon et al. (1986) U.S. Pat. No.4,582,789. However, the length of the hairpin loops in shortoligonucleotide probes would typically be insufficient to form a stableduplex.

[0410] In another embodiment, different targets may be simultaneouslysequenced where each target has a different label. For instance, onetarget could have a green fluorescent label and a second target couldhave a red fluorescent label. The scanning step will distinguish sitesof binding of the red label from those binding the green fluorescentlabel. Each sequence can be analyzed independently from one another.

[0411] Suitable chromogens will include molecules and compounds whichabsorb light in a distinctive range of wavelengths so that a color maybe observed, or emit light when irradiated with radiation of aparticular wave length or wave length range, e.g., fluorescers.Biliproteins, e.g., phycoerythrin, may also serve as labels.

[0412] A wide variety of suitable dyes are available, being primarilychosen to provide an intense color with minimal absorption by theirsurroundings. Illustrative dye types include quinoline dyes,triarylmethane dyes, acridine dyes, alizarine dyes, phthaleins, insectdyes, azo dyes, anthraquinoid dyes, cyanine dyes, phenazathionium dyes,and phenazoxonium dyes.

[0413] A wide variety of fluorescers may be employed either bythemselves or in conjunction with quencher molecules. Fluorescers ofinterest fall into a variety of categories having certain primaryfunctionalities. These primary functionalities include 1- and2-aminonaphthalene, p,p′-diaminostilbenes, pyrenes, quaternaryphenanthridine salts, 9-aminoacridines, p,p′-diaminobenzophenone imines,anthracenes, oxacarbocyanine, merocyanine, 3-aminoequilenin, perylene,bis-benzoxazole, bis-p-oxazolyl benzene, 1,2-benzophenazin, retinol,bis-3-aminopyridinium salts, hellebrigenin, tetracycline, sterophenol,benzimidzaolylphenylamine, 2-oxo-3-chromen, indole, xanthen,7-hydroxycoumarin, phenoxazine, salicylate, strophanthidin, porphyrins,triarylmethanes and flavin. Individual fluorescent compounds which havefunctionalities for linking or which can be modified to incorporate suchfunctionalities include, e.g., dansyl chloride; fluoresceins such as3,6-dihydroxy-9-phenylxanthhydrol; rhodamineisothiocyanate; N-phenyl1-amino-8-sulfonatonaphthalene; N-phenyl 2-amino-6-sulfonatonaphthalene;4-acetamido-4-isothiocyanato-stilbene-2,2′-disulfonic acid;pyrene-3-sulfonic acid; 2-toluidinonaphthalene-6-sulfonate; N-phenyl,N-methyl 2-aminoaphthalene-6-sulfonate; ethidium bromide; stebrine;auromine-0,2-(9′-anthroyl)palmitate; dansyl phosphatidylethanolamine;N,N′-dioctadecyl oxacarbocyanine; N,N′-dihexyl oxacarbocyanine;merocyanine, 4-(3′pyrenyl)butyrate; d-3-aminodesoxy-equilenin;12-(9′-anthroyl)stearate; 2-methylanthracene; 9-vinylanthracene;2,2′-(vinylene-p-phenylene)bisbenzoxazole;p-bis[2-(4-methyl-5-phenyl-oxazolyl)]benzene;6-dimethylamino-1,2-benzophenazin; retinol; bis(3′-aminopyridinium)1,10-decandiyl diiodide; sulfonaphthylhydrazone of hellibrienin;chlorotetracycline;N-(7-dimethylamino-4-methyl-2-oxo-3-chromenyl)maleimide;N-[p-(2-benzimidazolyl)-phenyl]maleimide; N-(4-fluoranthyl)maleimide;bis(homovanillic acid); resazarin;4-chloro-7-nitro-2,1,3-benzooxadiazole; merocyanine 540; resorufin; rosebengal; and 2,4-diphenyl-3(2H)-furanone.

[0414] Desirably, fluorescers should absorb light above about 300 nm,preferably about 350 nm, and more preferably above about 400 nm, usuallyemitting at wavelengths greater than about 10 nm higher than thewavelength of the light absorbed. It should be noted that the absorptionand emission characteristics of the bound dye may differ from theunbound dye. Therefore, when referring to the various wavelength rangesand characteristics of the dyes, it is intended to indicate the dyes asemployed and not the dye which is unconjugated and characterized in anarbitrary solvent.

[0415] Fluorescers are generally preferred because by irradiating afluorescer with light, one can obtain a plurality of emissions. Thus, asingle label can provide for a plurality of measurable events.

[0416] Detectable signal may also be provided by chemiluminescent andbioluminescent sources. Chemiluminescent sources include a compoundwhich becomes electronically excited by a chemical reaction and may thenemit light which serves as the detectible signal or donates energy to afluorescent acceptor. A diverse number of families of compounds havebeen found to provide chemiluminescence under a variety of conditions.One family of compounds is 2,3-dihydro-1,-4-phthalazinedione. The mostpopular compound is luminol, which is the 5-amino compound. Othermembers of the family include the 5-amino-6,7,8-trimethoxy- and thedimethylamino[ca]benz analog. These compounds can be made to luminescewith alkaline hydrogen peroxide or calcium hypochlorite and base.Another family of compounds is the 2,4,5-triphenylimidazoles, withlophine as the common name for the parent product. Chemiluminescentanalogs include para-dimethylamino and -methoxy substituents.Chemiluminescence may also be obtained with oxalates, usually oxalylactive esters, e.g., p-nitrophenyl and a peroxide, e.g., hydrogenperoxide, under basic conditions. Alternatively, luciferins may be usedin conjunction with luciferase or lucigenins to provide bioluminescence.

[0417] Spin labels are provided by reporter molecules with an unpairedelectron spin which can be detected by electron spin resonance (ESR)spectroscopy. Exemplary spin labels include organic free radicals,transitional metal complexes, particularly vanadium, copper, iron, andmanganese, and the like. Exemplary spin labels include nitroxide freeradicals.

[0418] B. Scanning System

[0419] With the automated detection apparatus, the correlation ofspecific positional labeling is converted to the presence on the targetof sequences for which the reagents have specificity of interaction.Thus, the positional information is directly converted to a databaseindicating what sequence interactions have occurred. For example, in anucleic acid hybridization application, the sequences which haveinteracted between the substrate matrix and the target molecule can bedirectly listed from the positional information. The detection systemused is described in Pirrung et al. (1992) U.S. Pat. No. 5,143,854; andSer. No. 07/624,120, now abandoned. Although the detection describedtherein is a fluorescence detector, the detector may be replaced by aspectroscopic or other detector. The scanning system may make use of amoving detector relative to a fixed substrate, a fixed detector with amoving substrate, or a combination. Alternatively, mirrors or otherapparatus can be used to transfer the signal directly to the detector.See, e.g, Ser. No. 07/624,120, now abandoned, which is herebyincorporated herein by reference.

[0420] The detection method will typically also incorporate some signalprocessing to determine whether the signal at a particular matrixposition is a true positive or may be a spurious signal. For example, asignal from a region which has actual positive signal may tend to spreadover and provide a positive signal in an adjacent region which actuallyshould not have one. This may occur, e.g., where the scanning system isnot properly discriminating with sufficiently high resolution in itspixel density to separate the two regions. Thus, the signal over thespatial region may be evaluated pixel by pixel to determine thelocations and the actual extent of positive signal. A true positivesignal should, in theory, show a uniform signal at each pixel location.Thus, processing by plotting number of pixels with actual signalintensity should have a clearly uniform signal intensity. Regions wherethe signal intensities show a fairly wide dispersion, may beparticularly suspect and the scanning system may be programmed to morecarefully scan those positions.

[0421] In another embodiment, as the sequence of a target is determinedat a particular location, the overlap for the sequence would necessarilyhave a known sequence. Thus, the system can compare the possibilitiesfor the next adjacent position and look at these in comparison with eachother. Typically, only one of the possible adjacent sequences shouldgive a positive signal and the system might be programmed to compareeach of these possibilities and select that one which gives a strongpositive. In this way, the system can also simultaneously provide somemeans of measuring the reliability of the determination by indicatingwhat the average signal to background ratio actually is.

[0422] More sophisticated signal processing techniques can be applied tothe initial determination of whether a positive signal exists or not.See, e.g., Ser. No. 07/624,120, now abandoned.

[0423] From a listing of those sequences which interact, data analysismay be performed on a series of sequences. For example, in a nucleicacid sequence application, each of the sequences may be analyzed fortheir overlap regions and the original target sequence may bereconstructed from the collection of specific subsequences obtainedtherein. Other sorts of analyses for different applications may also beperformed, and because the scanning system directly interfaces with acomputer the information need not be transferred manually. This providesfor the ability to handle large amounts of data with very little humanintervention. This, of course, provides significant advantages overmanual manipulations. Increased throughput and reproducibility isthereby provided by the automation of a vast majority of steps in any ofthese applications.

[0424] XI. Data Analysis

[0425] A. General

[0426] Data analysis will typically involve aligning the propersequences with their overlaps to determine the target sequence. Althoughthe target “sequence” may not specifically correspond to any specificmolecule, especially where the target sequence is broken and fragmentedin the sequencing process, the sequence corresponds to a contiguoussequence of the subfragments.

[0427] The data analysis can be performed by a computer using anappropriate program. See, e.g., Drmanac, R. et al. (1989) Genomics4:114-128; and a commercially available analysis program available fromthe Genetic Engineering Center, P.O. Box 794, 11000 Belgrade,Yugoslavia. Although the specific manipulations necessary to reassemblethe target sequence from fragments may take many forms, one embodimentuses a sorting program to sort all of the subsequences using a definedhierarchy. The hierarchy need not necessarily correspond to any physicalhierarchy, but provides a means to determine, in order, whichsubfragments have actually been found in the target sequence. In thismanner, overlaps can be checked and found directly rather than having tosearch throughout the entire set after each selection process. Forexample, where the oligonucleotide probes are 10-mers, the first 9positions can be sorted. A particular subsequence can be selected as inthe examples, to determine where the process starts. As analogous to thetheoretical example provided above, the sorting procedure provides theability to immediately find the position of the subsequence whichcontains the first 9 positions and can compare whether there exists morethan 1 subsequence during the first 9 positions. In fact, the computercan easily generate all of the possible target sequences which containgiven combination of subsequences. Typically there will be only one, butin various situations, there will be more.

[0428] An exemplary flow chart for a sequencing program is provided inFIG. 1. In general terms, the program provides for automated scanning ofthe substrate to determine the positions of probe and targetinteraction. Simple processing of the intensity of the signal may beincorporated to filter out clearly spurious signals. The positions withpositive interaction are correlated with the sequence specificity ofspecific matrix positions, to generate the set of matching subsequences.This information is further correlated with other target sequenceinformation, e.g., restriction fragment analysis. The sequences are thenaligned using overlap data, thereby leading to possible correspondingtarget sequences which will, optimally, correspond to a single targetsequence.

[0429] B. Hardware

[0430] A variety of computer systems may be used to run a sequencingprogram. The program may be written to provide both the detecting andscanning steps together and will typically be dedicated to a particularscanning apparatus. However, the components and functional steps may beseparated and the scanning system may provide an output, e.g., throughtape or an electronic connection into a separate computer whichseparately runs the sequencing analysis program. The computer may be anyof a number of machines provided by standard computer manufacturers,e.g., IBM compatible machines, Apple™ machines, VAX machines, andothers, which may often use a UNIXT™ operating system. Of course, thehardware used to run the analysis program will typically determine whatprogramming language would be used.

[0431] C. Software

[0432] Software would be easily developed by a person of ordinary skillin the programming art, following the flow chart provided, or based uponthe input provided and the desired result.

[0433] Of course, an exemplary embodiment is a polynucleotide sequencesystem. However, the theoretical and mathematical manipulationsnecessary for data analysis of other linear molecules, such aspolypeptides, carbohydrates, and various other polymers are conceptuallysimilar. Simple branching polymers will usually also be sequencableusing similar technology. However, where there is branching, it may bedesired that additional recognition reagents be used to determine thenature and location of branches. This can easily be provided by use ofappropriate specific reagents which would be generated by methodssimilar to those used to produce specific reagents for linear polymers.

[0434] XII. Substrate Reuse

[0435] Where a substrate is made with specific reagents that arerelatively insensitive to the handling and processing steps involved ina single cycle of use, the substrate may often be reused. The targetmolecules are usually stripped off of the solid phase specificrecognition molecules. Of course, it is preferred that the manipulationsand conditions be selected as to be mild and to not affect thesubstrate. For example, if a substrate is acid labile, a neutral pHwould be preferred in all handling steps. Similar sensitivities would becarefully respected where recycling is desired.

[0436] A. Removal of Label

[0437] Typically for a recycling, the previously attached specificinteraction would be disrupted and removed. This will typically involveexposing the substrate to conditions under which the interaction betweenprobe and target is disrupted. Alternatively, it may be exposed toconditions where the target is destroyed. For example, where the probesare oligonucleotides and the target is a polynucleotide, a heating andlow salt wash will often be sufficient to disrupt the interactions.Additional reagents may be added such as detergents, and organic orinorganic solvents which disrupt the interaction between the specificreagents and target. In an embodiment where the specific reagents areantibodies, the substrate may be exposed to a gentle detergent whichwill denature the specific binding between the antibody and its target.The conditions are selected to avoid severe disruption or destruction ofthe structure of the antibody and to maintain the specificity of theantibody binding site. Conditions with specific pH, detergentconcentration, salt concentration, ionic concentration, and otherparameters may be selected which disrupt the specific interactions.

[0438] B. Storage and Preservation

[0439] As indicated above, the matrix will typically be maintained underconditions where the matrix itself and the linkages and specificreagents are preserved. Various specific preservatives may be addedwhich prevent degradation. For example, if the reagents are acid or baselabile, a neutral pH buffer will typically be added. It is also desiredto avoid destruction of the matrix by growth of organisms which maydestroy organic reagents attached thereto. For this reason, apreservative such as cyanide or azide may be added. However, thechemical preservative should also be selected to preserve the chemicalnature of the linkages and other components of the substrate. Typically,a detergent may also be included.

[0440] C. Processes to Avoid Degradation of Oligomers

[0441] In particular, a substrate comprising a large number of oligomerswill be treated in a fashion which is known to maintain the quality andintegrity of oligonucleotides. These include storing the substrate in acarefully controlled environment under conditions of lower temperature,cation depletion (EDTA and EGTA), sterile conditions, and inert argon ornitrogen atmosphere.

[0442] XIII. Integrated Sequencing Strategy

[0443] A. Initial Manning Strategy

[0444] As indicated above, although the VLSIPS™ technology may beapplied to sequencing embodiments, it is often useful to integrate otherconcepts to simplify the sequencing. For example, nucleic acids may beeasily sequenced by careful selection of the vectors and hosts used foramplifying and generating the specific target sequences. For example, itmay be desired to use specific vectors which have been designed tointeract most efficiently with the VLSIPS substrate. This is alsoimportant in fingerprinting and mapping strategies. For example, vectorsmay be carefully selected having particular complementary sequenceswhich are designed to attach to a genetic or specific oligomer on thesubstrate. This is also applicable to situations where it is desired totarget particular sequences to specific locations on the matrix.

[0445] In one embodiment, unnatural oligomers may be used to targetnatural probes to specific locations on the VLSIPS substrate. Inaddition, particular probes may be generated for the mapping embodimentwhich are designed to have specific combinations of characteristics. Forexample, the construction of a mapping substrate may depend upon use ofanother automated apparatus which takes clones isolated from achromosome walk and attaches them individually or in bulk to the VLSIPSsubstrate.

[0446] In another embodiment, a variety of specific vectors having knownand particular “targeting” sequences adjacent to the cloning sites maybe individually used to clone a selected probe, and the isolated probewill then be targetable to a site on the VLSIPS substrate with asequence complementary to the “target” sequence.

[0447] B. Selection of Smaller Clones

[0448] In the fingerprinting and mapping embodiments, the selection ofprobes may be very important. Significant mathematical analysis may beapplied to determine which specific sequences should be used as thoseprobes. Of course, for fingerprinting use, these sequences would be mostdesired that show significant heterogeneity across the human population.Selection of the specific sequences which would most favorably beutilized will tend to be single copy sequences within the genome.

[0449] Various hybridization selection procedures may be applied toselect sequences which tend not to be repeated within a genome, and thuswould tend to be conserved across individuals. For example,hybridization selections may be made for non-repetitive and single copysequences. See, e.g., Britten and Kohne (1968) “Repeated Sequences inDNA,” Science 161:529-540. On the other hand, it may be desired undercertain circumstances to use repeated sequences. For example, where afingerprint may be used to identify or distinguish different species, orwhere repetitive sequences may be diagnostic of specific species,repetitive sequences may be desired for inclusion in the fingerprintingprobes. In either case, the sequencing capability will greatly assist inthe selection of appropriate sequences to be used as probes.

[0450] Also as indicated above, various means for constructing anappropriate substrate may involve either mechanical or automatedprocedures. The standard VLSIPS automated procedure involvessynthesizing oligonucleotides or short polymers directly on thesubstrate. In various other embodiments, it is possible to attachseparately synthesized reagents onto the matrix in an ordered array.Other circumstances may lend themselves to transfer a pattern from apetri plate onto a solid substrate. Also, there are methods for sitespecifically directing collections of reagents to specific locationsusing unnatural nucleotides or equivalent sorts of targeting molecules.

[0451] While a brute force manual transfer process may be utilizedsequentially for attaching various samples to successive positions,instrumentation for automating such procedures may also be devised. Theautomated system for performing such would preferably be relativelyeasily designed and conceptually easily understood.

[0452] XIV. Commercial Applications

[0453] A. Sequencing

[0454] As indicated above, sequencing may be performed either de novo oras a verification of another sequencing method. The presenthybridization technology provides the ability to sequence nucleic acidsand polynucleotides de novo, or as a means to verify either the Maxamand Gilbert chemical sequencing technique or Sanger and Coulsondideoxy-sequencing techniques. The hybridization method is useful toverify sequencing determined by any other sequencing technique and toclosely compare two similar sequences, e.g., to identify and locatesequence differences.

[0455] Besides polynucleotide sequencing, the present invention alsoprovides means for sequencing other polymers. This includespolypeptides, carbohydrates, synthetic organic polymers, and otherpolymers. Again, the sequencing may be either verification or de novo.

[0456] Of course, sequencing can be very important in many differentsorts of environments. For example, it will be useful in determining thegenetic sequence of particular markers in various individuals. Inaddition, polymers may be used as markers or for information containingmolecules to encode information. For example, a short polynucleotidesequence may be included in large bulk production samples indicating themanufacturer, date, and location of manufacture of a product. Forexample, various drugs may be encoded with this information with a smallnumber of molecules in a batch. For example, a pill may have somewherefrom 10 to 100 to 1,000 or more very short and small molecules encodingthis information. When necessary, this information may be decoded from asample of the material using a polymerase chain reaction (PCR) or otheramplification method. This encoding system may be used to provide theorigin of large bulky samples without significantly affecting theproperties of those samples. For example, chemical samples may also beencoded by this method thereby providing means for identifying thesource and manufacturing details of lots. The origin of bulk hydrocarbonsamples may be encoded. Production lots of organic compounds such asbenzene or plastics may be encoded with a short molecule polymer. Foodstuffs may also be encoded using similar marking molecules. Even toxicwaste samples can be encoded determining the source or origin. In thisway, proper disposal can be traced or more easily enforced.

[0457] Similar sorts of encoding may be provided by fingerprinting-typeanalysis. Whether the resolution is absolute or less so, the concept ofcoding information on molecules such as nucleic acids, which can beamplified and later decoded, may be a very useful and importantapplication.

[0458] This technology also provides the ability to include markers fororigins of biological materials. For example, a patented animal line maybe transformed with a particular unnatural sequence which can be tracedback to its origin. With a selection of multiple markers, the likelihoodcould be negligible that a combination of markers would haveindependently arisen from a source other than the patented orspecifically protected source. This technique may provide a means fortracing the actual origin of particular biological materials. Bacteria,plants, and animals will be subject to marking by such encodingsequences.

[0459] B. Fingerprinting

[0460] As indicated above, fingerprinting technology may also be usedfor data encryption. Moreover, fingerprinting allows for significantidentification of particular individuals. Where the fingerprintingtechnology is standardized, and used for identification of large numbersof people, related equipment and peripheral processing will be developedto accompany the underlying technology. For example, specific equipmentmay be developed for automatically taking a biological sample andgenerating or amplifying the information molecules within the sample tobe used in fingerprinting analysis. Moreover, the fingerprintingsubstrate may be mass produced using particular types of automaticequipment. Synthetic equipment may produce the entire matrixsimultaneously by stepwise synthetic methods as provided by the VLSIPS™technology. The attachment of specific probes onto a substrate may alsobe automated, e.g., making use of the caged biotin technology. See,e.g., Barrett et al. (1993) U.S. Pat. No. 5,252,743. As indicated above,there are automated methods for actually generating the matrix andsubstrate with distinct sequence reagents positionally located at eachof the matrix positions. Where such reagents are, e.g., unnatural aminoacids, a targeting function may be utilized which does not interferewith a natural nucleotide functionality.

[0461] In addition, peripheral processing may be important and may bededicated to this specific application. Thus, automated equipment forproducing the substrates may be designed, or particular systems whichtake in a biological sample and output either a computer readout or anencoded instrument, e.g., a card or document which indicates theinformation and can provide that information to others. Anidentification having a short magnetic strip with a few million bits maybe used to provide individual identification and important medicalinformation useful in a medical emergency.

[0462] In fact, data banks may be set up to correlate all of thisinformation of fingerprinting with medical information. This may allowfor the determination of correlations between various medical problemsand specific DNA sequences. By collating large populations of medicalrecords with genetic information, genetic propensities and geneticsusceptibilities to particular medical conditions may be developed.Moreover, with standardization of substrates, the micro encoding datamay be also standardized to reproduce the information from a centralizeddata bank or on an encoding device carried on an individual person. Onthe other hand, if the fingerprinting procedure is sufficiently quickand routine, every hospital may routinely perform a fingerprintingoperation and from that determine many important medical parameters foran individual.

[0463] In particular industries, the VLSIPS sequencing, fingerprinting,or mapping technology will be particularly appropriate. As mentionedabove, agricultural livestock suppliers may be able to encode anddetermine whether their particular strains are being used by others. Byincorporating particular markers into their genetic stocks, the markerswill indicate origin of genetic material. This is applicable to seedproducers, livestock producers, and other suppliers of medical oragricultural biological materials.

[0464] This may also be useful in identifying individual animals orplants. For example, these markers may be useful in determining whethercertain fish return to their original breeding grounds, whether seaturtles always return to their original birthplaces, or to determine themigration patterns and viability of populations of particular endangeredspecies. It would also provide means for tracking the sources ofparticular animal products. For example, it might be useful fordetermining the origins of controlled animal substances such as elephantivory or particular bird populations whose importation or exportation iscontrolled.

[0465] As indicated above, polymers may be used to encode importantinformation on source and batch and supplier. This is described ingreater detail, e.g., “Applications of PCR to industrial problems,”(1990) in Chemical and Engineering News 68:145, which is herebyincorporated herein by reference. In fact, the synthetic method can beapplied to the storage of enormous amounts of information. Smallsubstrates may encode enormous amounts of information, and its recoverywill make use of the inherent replication capacity. For example, onregions of 10 μM×10 μm, 1 cm² has 10⁶ regions. In theory, the entirehuman genome could be attached in 1000 nucleotide segments on a 3 cm²surface. Genomes of endangered species may be stored on thesesubstrates.

[0466] Fingerprinting may also be used for genetic tracing or foridentifying individuals for forensic science purposes. See, e.g.,Morris, J. et al. (1989) “Biostatistical Evaluation of Evidence FromContinuous Allele Frequency Distribution DNA Probes in Reference toDisputed Paternity and Identity,” J. Forensic Science 34:1311-1317, andreferences provided therein; each of which is hereby incorporated hereinby reference.

[0467] In addition, the high resolution fingerprinting allows thedistinguishability to high resolution of particular samples. Asindicated above, new cell classifications may be defined based oncombinations of a large number of properties. Similar applications willbe found in distinguishing different species of animals or plants. Infact, microbial identification may become dependent on characterizationof the genetic content. Tumors or other cells exhibiting abnormalphysiology will be detectable by use of the present invention. Also,knowing the genetic fingerprint of a microorganism may provide veryuseful information on how to treat an infection by such organism.

[0468] Modifications of the fingerprint embodiments may be used todiagnose the condition of the organism. For example, a blood sample ispresently used for diagnosing any of a number of different physiologicalconditions. A multi-dimensional fingerprinting method made available bythe present invention could become a routine means for diagnosing anenormous number of physiological features simultaneously. This mayrevolutionize the practice of medicine in providing information on anenormous number of parameters together at one time. In another way, thegenetic predisposition may also revolutionize the practice of medicineproviding a physician with the ability to predict the likelihood ofparticular medical conditions arising at any particular moment. It alsoprovides the ability to apply preventive medicine.

[0469] The present invention might also find application in use forscreening new drugs and new reagents which may be very important inmedical diagnosis or other applications. For example, a description ofgenerating a population of monoclonal antibodies with definedspecificities may be very useful for producing various drugs ordiagnostic reagents.

[0470] Also available are kits with the reagents useful for performingsequencing, fingerprinting, and mapping procedures. The kits will havevarious compartments with the desired necessary reagents, e.g.,substrate, labeling reagents for target samples, buffers, and otheruseful accompanying products.

[0471] C. Mapping

[0472] The present invention also provides the means for mappingsequences within enormous stretches of sequence. For example, nucleotidesequences may be mapped within enormous chromosome size sequence maps.For example, it would be possible to map a chromosomal location withinthe chromosome which contains hundreds of millions of nucleotide basepairs. In addition, the mapping and fingerprinting embodiments allow fortesting of chromosomal translocations, one of the standard problems forwhich amniocentesis is performed.

[0473] Thus, the present invention provides a powerful tool and themeans for performing sequencing, fingerprinting, and mapping functionson polymers. Although most easily and directly applicable topolynucleotides, polypeptides, carbohydrates, and other sorts ofmolecules can be advantageously utilized using the present technology.

[0474] The present invention will be better understood by reference tothe following illustrative examples. The following examples are offeredby way of illustration and not by way of limitation.

Experimental

[0475] I. Sequencing

[0476] A. polynucleotide

[0477] B. polypeptide

[0478] C. short peptide

[0479] 1. Herz antibody identification

[0480] II. Fingerprinting

[0481] A. polynucleotide fingerprint

[0482] B. peptide fingerprint

[0483] C. cell classification scheme

[0484] D. temporal development scheme

[0485] 1. developmental antigens

[0486] 2. developmental mRNA expression

[0487] E. diagnostic test

[0488] 1. viral identification

[0489] 2. bacterial identification

[0490] 3. other microbiological identifications

[0491] 4. allergy test (immobilized antigens)

[0492] F. individual (animal/plant) identification

[0493] 1. genetic

[0494] 2. immunological

[0495] G. genetic screen

[0496] 1. test alleles with markers

[0497] 2. amniocentesis

[0498] III. Mapping

[0499] A. positionally located clones (caged biotin)

[0500] 1. short probes, long targets

[0501] 2. long targets, short probes

[0502] B. positionally defined clones

[0503] IV. Conclusion

[0504] Relevant applications whose techniques are incorporated herein byreference are Pirrung, et al., Ser. No. 07/362,901, filed Jun. 7, 1989,now abandoned; Pirrung et al. (1992) U.S. Pat. No. 5,143,854; Barrett,et al., Ser. No. 07/435,316 filed Nov. 13, 1989, now abandoned; Barrett,et al. (1993) U.S. Pat. No. 5,252,743; and commonly assigned andsimultaneously filed applications Ser. No. 07/624,120, now abandoned,and Ser. No. 07/626,730.

[0505] Also, additional relevant techniques are described, e.g., inSambrook, J., et al. (1989) Molecular Cloning: a Laboratory Manual, 2dEd., vols 1-3, Cold Spring Harbor Press, New York; Greenstein and Winitz(1961) Chemistry of the Amino Acids, Wiley and Sons, New York;Bodzansky, M. (1988) Peptide Chemistry: a Practical Textbook,Springer-Verlag, New York; Harlow and Lane (1988) Antibodies: ALaboratory Manual, Cold Spring Harbor Press, New York; Glover, D. (ed.)(1987) DNA Cloning: A Practical Approach 1987) Nucleic Acid and ProteinSequence Analysis: A Practical Approach, IRL Press, Oxford; Hames andHiggins (1985) Nucleic Acid Hybridisation: A Practical Approach, IRLPress, Oxford; Wu et al. (1989) Recombinant DNA Methodoloy, AcademicPress, San Diego; Goding (1986) Monoclonal Antibodies: Principles andPractice, (2d ed.), Academic Press, San Diego; Finegold and Barron(1986) Bailey and Scott's Diagnostic Microbiology, (7th ed.), Mosby Co.,St. Louis; Collins et al. (1989) Microbioloical Methods, (6th ed.),Butterworth, London; Chaplin and Kennedy (1986) Carbohydrate Analysis: APractical Approach, IRL Press, Oxford; Van Dyke (ed.) (1985)Bioluminescence and Chemiluminescence: Instruments and Applications, vol1, CRC Press, Boca Rotan; and Ausubel et al. (ed.) (1990) CurrentProtocols in Molecular Biology, Greene Publishing andWiley-Interscience, New York; each of which is hereby incorporatedherein by reference.

[0506] The following examples are provided to illustrate the efficacy ofthe inventions herein. All operations were conducted at about ambienttemperatures and pressures unless indicated to the contrary.

[0507] I. SEQUENCING

[0508] A. Polynucleotide

[0509] 1. HPLC of the photolysis of 5′-O-nitroveratryl-thymidine.

[0510] In order to determine the time for photolysis of5′-O-nitroveratryl thymidine to thymidine a 100 μM solution ofNV-Thym-OH (5′-O-nitroveratryl thymidine) in dioxane was made and -200μl aliquots were irradiated (in a quartz cuvette 1 cm×2 mm) at 362.3 nmfor 20 sec, 40 sec, 60 sec, 2 min, 5 min, 10 min, 15 min, and 20 min.The resulting irradiated mixtures were then analyzed by HPLC using aVarian MicroPak SP column (C₁₈ analytical) at a flow rate of 1 ml/minand a solvent system of 40% CH₃CN and 60% water. Thymidine has aretention time of 1.2 min and NVO-Thym-OH has a retention time of 2.1min. It was seen that after 10 min of exposure the deprotection wascomplete.

[0511] 2. Preparation.and Detection of Thymidine-Cytidine Dimer (FITC)

[0512] The reaction is illustrated:

[0513] To an aminopropylated glass slide (standard VLSIPS™ Technology)was added a mixture of the following:

[0514] 12.2 mg of NVO-Thym-CO₂H (IX)

[0515] 3.4 mg of HOBT (N-hydroxybenztriazal)

[0516] 8.8 μl DIEA (Diisopropylethylamine)

[0517] 11.1 mg BOP reagent

[0518] 2.5 ml DMF

[0519] After 2 h coupling time (standard VLSIPS) the plate was washed,acetylated with acetic anhydride/pyridine, washed, dried, and photolyzedin dioxane at 362 nm at 14 mW/cm² for 10 min using a 500 μm checkerboardmask. The slide was then taken and treated with a mixture of thefollowing:

[0520] 107 mg of FMOC-amine modified C (III)

[0521] 21 mg of tetrazole

[0522] 1 ml anhydrous CH₃CN

[0523] After being treated for approximately 8 min, the slide was washedoff with CH₃CN, dried, and oxidized with I₂/H₂O/THF/lutidine for 1 min.The slide was again washed, dried, and treated for 30 min with a 20%solution of DBU in DMF. After thorough rinsing of the slide, it was nextexposed to a FITC solution (1 mM fluorescein isothiocyanate [FITC] inDMF) for 50 min, then washed, dried, and examined by fluorescencemicroscopy. This reaction is illustrated:

[0524] 3. Preparation and Detection of Thymidine-Cytidine Dimer (Biotin)

[0525] An aminopropyl glass slide, was soaked in a solution of ethyleneoxide (20% in DMF) to generate a hydroxylated surface. The slide wasadded to a mixture of the following:

[0526] 32 mg of NVO-T-OCED (X)

[0527] 11 mg of tetrazole

[0528] 0.5 ml of anhydrous CH₃CN Afte

[0529] r 8 min the plate was then rinsed with acetonitrile, thenoxidized with I₂/H₂O/THF/lutidine for 1 min, washed and dried. The slidewas then exposed to a 1:3 mixture of acetic anhydride:pyridine for 1 h,then washed and dried. The substrate was then photolyzed in dioxane at362 nm at 14 mW/cm² for 10 min using a 500 μm checkerboard mask, dried,and then treated with a mixture of the following:

[0530] 65 mg of biotin modified C (IV)

[0531] 11 mg of tetrazole

[0532] 0. 5 ml anhydrous CH3CN

[0533] After 8 min the slide was washed with CH₃CN then oxidized withI₂/H₂O/THF/lutidine for 1 min, washed, and then dried. The slide wasthen soaked for 30 min in a PBS/0.05% Tween 20 buffer and the solutionthen shaken off. The slide was next treated with FITC-labeledstreptavidin at 10 μg/ml in the same buffer system for 30 min. Afterthis time the streptavidin-buffer system was rinsed off with freshPBS/0.05% Tween 20 buffer and then the slide was finally agitated indistilled water for about ½ h. After drying, the slide was examined byfluorescence microscopy.

[0534] 4. Substrate Preparation

[0535] Before attachment of reactive groups it is preferred to clean thesubstrate which is, in a preferred embodiment, a glass substrate such asa microscope slide or cover slip. A roughened surface will be useablebut a plastic or other solid substrate is also appropriate. According toone embodiment the slide is soaked in an alkaline bath consisting of,e.g., 1 liter of 95% ethanol with 120 ml of water and 120 grams ofsodium hydroxide for 12 hours. The slides are washed with a buffer andunder running water, allowed to air dry, and rinsed with a solution of95% ethanol.

[0536] The slides are then aminated with, e.g.,aminopropyltriethoxysilane for the purpose of attaching amino groups tothe glass surface on linker molecules, although other omegafunctionalized silanes could also be used for this purpose. In oneembodiment 0.1% aminopropyltriethoxysilane is utilized, althoughsolutions with concentrations from 10⁻⁷% to 10% may be used, with about10⁻³% to 2% preferred. A 0.1% mixture is prepared by adding to 100 ml ofa 95% ethanol/5% water mixture, 100 microliters (μl) ofaminopropyltriethoxysilane. The mixture is agitated at about ambienttemperature on a rotary shaker for an appropriate amount of time, e.g.,about 5 minutes. 500 μl of this mixture is then applied to the surfaceof one side of each cleaned slide. After 4 minutes or more, the slidesare decanted of this solution and thoroughly rinsed three times or moreby dipping in 100% ethanol.

[0537] After the slides dry, they are heated in a 110-120° C. vacuumoven for about 20 minutes, and then allowed to cure at room temperaturefor about 12 hours in an argon environment. The slides are then dippedinto DMF (dimethylformamide) solution, followed by a thorough washingwith methylene chloride.

[0538] 5. Linker Attachment, Blocking of Free Sites

[0539] The aminated surface of the slide is then exposed to about 500 μlof, for example, a 30 millimolar (mM) solution of NVOC-nucleotide-NHS(N-hydroxysuccinimide) in DMF for attachment of a NVOC-nucleotide toeach of the amino groups. See, e.g., SIGMA Chemical Company for variousnucleotide derivatives. The surface is washed with, for example, DMF,methylene chloride, and ethanol.

[0540] Any unreacted aminopropyl silane on the surface, i.e., thoseamino groups which have not had the NVOC-nucleotide attached, are nowcapped with acetyl groups (to prevent further reaction) by exposure to a1:3 mixture of acetic anhydride in pyridine for 1 hour. Other materialswhich may perform this residual capping function include trifluoroaceticanhydride, formicacetic anhydride, or other reactive acylating agents.Finally, the slides are washed again with DMF, methylene chloride, andethanol.

[0541] 6. Synthesis of Eight Trimers of C and T

[0542]FIG. 2 illustrates a possible synthesis of the eight trimers ofthe two-monomer set: cytosine and thymine (represented by C and T,respectively). A glass slide bearing silane groups terminating in6-nitroveratryloxycarboxamide (NVOC-NH) residues is prepared as asubstrate. Active esters (pentafluorophenyl, OBt, etc.) of cytosine andthymine protected at the 5′ hydroxyl group with NVOC are prepared asreagents. While not pertinent to this example, if side chain protectinggroups are required for the monomer set, these must not be photoreactiveat the wavelength of light used to protect the primary chain.

[0543] For a monomer set of size n, n×1 cycles are required tosynthesize all possible sequences of length 1. A cycle consists of:

[0544] 1. Irradiation through an appropriate mask to expose the 5′-OHgroups at the sites where the next residue is to be added, withappropriate washes to remove the by-products of the deprotection.

[0545] 2. Addition of a single activated and protected (with the samephotochemically-removable group) monomer, which will react only at thesites addressed in step 1, with appropriate washes to remove the excessreagent from the surface.

[0546] The above cycle is repeated for each member of the monomer setuntil each location on the surface has been extended by one residue inone embodiment. In other embodiments, several residues are sequentiallyadded at one location before moving on to the next location. Cycle timeswill generally be limited by the coupling reaction rate, now as short asabout 10 min in automated oligonucleotide synthesizers. This step isoptionally followed by addition of a protecting group to stabilize thearray for later testing. For some types of polymers (e.g., peptides), afinal deprotection of the entire surface (removal of photoprotectiveside chain groups) may be required.

[0547] More particularly, as shown in FIG. 2A, the glass 20 is providedwith regions 22, 24, 26, 28, 30, 32, 34, and 36. Regions 30, 32, 34, and36 are masked, indicated by the hatched regions, as shown in FIG. 2B andthe glass is irradiated by the bright regions 22, 24, 26, and 28, andexposed to a reagent containing a photosensitive blocked C (e.g.,cytosine derivative), with the resulting structure shown in FIG. 2C. Thesubstrate is carefully washed and the reactants removed. Thereafter,regions 22, 24, 26, and 28 are masked, as indicated by the hatchedregion, the glass is irradiated (as shown in FIG. 2D), as indicated bythe bright regions, at 30, 32, 34, and 36, and exposed to aphotosensitive blocked reagent containing T (e.g., thymine derivative),with the resulting structure shown in FIG. 2E. The process proceeds,consecutively masking and exposing the sections as shown until thestructure shown in FIG. 2M is obtained. The glass is irradiated and theterminal groups are, optionally, capped by acetylation. As shown, allpossible trimers of cytosine/thymine are obtained.

[0548] In this example, no side chain protective group removal isnecessary, as might be common in modified nucleotides. If it is desired,side chain deprotection may be accomplished by treatment withethanedithiol and trifluoro-acetic acid.

[0549] In general, the number of steps needed to obtain a particularpolymer chain is defined by:

n×l  (1)

[0550] where:

[0551] n=the number of monomers in the basis set of monomers, and

[0552] l=the number of monomer units in a polymer chain.

[0553] Conversely, the synthesized number of sequences of length l willbe:

n^(l).  (2)

[0554] Of course, greater diversity is obtained by using maskingstrategies which will also include the synthesis of polymers having alength of less than l. If, in the extreme case, all polymers having alength less than or equal to l are synthesized, the number of polymerssynthesized will be:

n^(l)+n^(l−l)+ . . . +n^(l).  (3)

[0555] The maximum number of lithographic steps needed will generally ben for each “layer” of monomers, i.e., the total number of masks (and,therefore, the number of lithographic steps) needed will be n×l. Thesize of the transparent mask regions will vary in accordance with thearea of the substrate available for synthesis and the number ofsequences to be formed. In general, the size of the synthesis areas willbe:

size of synthesis areas=(A)/(S)

[0556] where:

[0557] A is the total area available for synthesis; and

[0558] S is the number of sequences desired in the area.

[0559] It will be appreciated by those of skill in the art that theabove method could readily be used to simultaneously produce thousandsor millions of oligomers on a substrate using the photolithographictechniques disclosed herein. Consequently, the method results in theability to practically test large numbers of, for example, di, tri,tetra, penta, hexa, hepta, octa, nona, deca, even dodecanucleotides, orlarger polynucleotides (or correspondingly, polypeptides).

[0560] The above example has illustrated the method by way of a manualexample. It will of course be appreciated that automated orsemi-automated methods could be used. The substrate would be mounted ina flow cell for automated addition and removal of reagents, to minimizethe volume of reagents needed, and to more carefully control reactionconditions. Successive masks will be applicable manually orautomatically. See, e.g., Pirrung et al. (1992) U.S. Pat. No. 5,143,854and Ser. No. 07/624,120, now abandoned.

[0561] 7. Labeling of Target

[0562] The target oligonucleotide can be labeled using standardprocedures referred to above. As discussed, for certain situations, areagent which recognizes interaction, e.g., ethidium bromide, may beprovided in the detection step. Alternatively, fluorescence labelingtechniques may be applied, see, e.g., Smith, et al. (1986) Nature, 321:674-679; and Prober, et al. (1987) Science, 238:336-341. The techniquesdescribed therein will be followed with minimal modifications asappropriate for the label selected.

[0563] 8. Dimers of A, C, G, and T

[0564] The described technique may be applied, with photosensitiveblocked nucleotides corresponding to adenine, cytosine, guanine, andthymine, to make combinations of polynucleotides consisting of each ofthe four different nucleotides. All 16 possible dimers would be madeusing a minor modification of the described method.

[0565] 9. 10-mers of A, C, G, and T

[0566] The described technique for making dimers of A, C, G, and T maybe further extended to make longer oligonucleotides. The automatedsystem described, e.g., in Pirrung et al. (1992) U.S. Pat. No.5,143,854, and Ser. No. 07/624,120, now abandoned, can be adapted tomake all possible 10-mers composed of the 4 nucleotides A, C, G, and T.The photosensitive, blocked nucleotide analogues have been describedabove, and would be readily adaptable to longer oligonucleotides.

[0567] 10. Specific Recognition Hybridization to 10-mers

[0568] The described hybridization conditions are directly applicable tothe sequence specific recognition reagents attached to the substrate,produced as described immediately above. The 10-mers have an inherentproperty of hybridizing to a complementary sequence. For optimumdiscrimination between full matching and some mismatch, the conditionsof hybridization should be carefully selected, as described above.Careful control of the conditions, and titration of parameters should beperformed to determine the optimum collective conditions.

[0569] 11. hybridization

[0570] Hybridization conditions are described in detail, e.g., in Hamesand Higgins (1985) Nucleic Acid Hybridisation: A Practical Approach; andthe considerations for selecting particular conditions are described,e.g., in Wetmur and Davidson, (1988) J. Mol. Biol. 31:349-370, and Woodet al. (1985) Proc. Natl. Acad. Sci. USA 82:1585-1588. As describedabove, conditions are desired which can distinguish matching along theentire length of the probe from where there is one or more mismatchedbases. The length of incubation and conditions will be similar, in manyrespects, to the hybridization conditions used in Southern blottransfers. Typically, the GC bias may be minimized by the introductionof appropriate concentrations of the alkylammonium buffers, as describedabove.

[0571] Titration of the temperature and other parameters is desired todetermine the optimum conditions for specificity and distinguishabilityof absolutely matched hybridization from mismatched hybridization.

[0572] A fluorescently labeled target or set of targets are generated,as described in Prober, et al. (1987) Science 238:336-341, or Smith, etal. (1986) Nature 321:674-679. Preferably, the target or targets are ofthe same length as, or slightly longer, than the oligonucleotide probesattached to the substrate and they will have known sequences. Thus, onlya few of the probes hybridize perfectly with the target, and whichparticular ones did would be known.

[0573] The substrate and probes are incubated under appropriateconditions for a sufficient period of time to allow hybridization tocompletion. The time is measured to determine when the probe-targethybridizations have reached completion. A salt buffer which minimizes GCbias is preferred, incorporating, e.g., buffer, such as tetramethylammonium or tetraethyl ammonium ion at between about 2.4 and 3.0 M. SeeWood, et al. (1985) Proc. Nat′l Acad. Sci. USA 82:1585-1588. This timeis typically at least about 30 min, and may be as long as about 1-5days. Typically very long matches will hybridize more quickly, veryshort matches will hybridize less quickly, depending upon relativetarget and probe concentrations. The hybridization will be performedunder conditions where the reagents are stable for that time duration.

[0574] Upon maximal hybridization, the conditions for washing aretitrated. Three parameters initially titrated are time, temperature, andcation concentration of the wash step. The matrix is scanned at varioustimes to determine the conditions at which the distinguishabilitybetween true perfect hybrid and mismatched hybrid is optimized. Theseconditions will be preferred in the sequencing embodiments.

[0575] 12. Positional Detection of Specific Interaction

[0576] As indicated above, the detection of specific interactions may beperformed by detecting the positions where the labeled target sequencesare attached. Where the label is a fluorescent label, the apparatusdescribed, e.g., in Pirrung et al. (1992) U.S. Pat. No. 5,143,854; andSer. No. 07/624,120, now abandoned, may be advantageously applied. Inparticular, the synthetic processes described above will result in amatrix pattern of specific sequences attached to the substrate, and aknown pattern of interactions can be converted to correspondingsequences. In an alternative embodiment, a separate reagent whichdifferentially interacts with the probe and interacted probe/targets canindicate where interaction occurs or does not occur. A single-strandspecific reagent will indicate where no interaction has taken place,while a double-strand specific reagent will indicate where interactionhas taken place. An intercalating dye, e.g., ethidium bromide, may beused to indicate the positions of specific interaction.

[0577] 13. Analysis

[0578] Conversion of the positional data into sequence specificity willprovide the set of subsequences whose analysis by overlap segments, maybe performed, as described above. Analysis is provided by themethodology described above, or using, e.g., software available from theGenetic Engineering Center, P.O. Box 794, 11000 Belgrade, Yugoslavia(Yugoslav group). See, also, Macevicz, PCT publication no. WO 90/04652,which is hereby incorporated herein by reference.

[0579] B. Polypeptide

[0580] The description of the preparation of short peptides on asubstrate incorporates by reference sections in Pirrung et al. (1992)U.S. Pat. No. 5,143,854, and described below.

[0581] 1. Slide Preparation

[0582] Preparation of the substrate follows that described above fornucleotides.

[0583] 2. Linker Attachment, Blocking of Free Sites

[0584] The aminated surface of the slide is exposed to about 500 μl of,e.g., a 30 millimolar (mM) solution of NVOC-GABA (gamma amino butyricacid) NHS (N-hydroxysuccinimide) in DMF for attachment of a NVOC-GABA toeach of the amino groups. The surface is washed with, for example, DMF,methylene chloride, and ethanol. See Ser. No. 07,624,120, now abandoned,for details on amino acid chemistry.

[0585] Any unreacted aminopropyl silane on the surface, i.e., thoseamino groups which have not had the NVOC-GABA attached, are now cappedwith acetyl groups (to prevent further reaction) by exposure to a 1:3mixture of acetic anhydride in pyridine for 1 hour. Other materialswhich may perform this residual capping function include trifluoroaceticanhydride, formicacetic anhydride, or other reactive acylating agents.Finally, the slides are washed again with DMF, methylene chloride, andethanol.

[0586] 3. Synthesis of 8 Trimers of “A” and “B”

[0587] See Pirrung et al. (1992) U.S. Pat. No. 5,143,854 which describesthe preparation of glycine and phenylalanine trimers. The technique issimilar to the method described above for making triners of C and T, butsubstituting photosensitive blocked glycine for the C derivative andphotosensitive blocked phenylalamine for the T derivative.

[0588] 4. Synthesis of a Dimer of an Aminopropyl Group and a FluorescentGroup

[0589] In synthesizing the dimer of an aminopropyl group and afluorescent group, a functionalized Durapore™ membrane was used as asubstrate. The Durapore™ membrane was a polyvinylidine difluoride withaminopropyl groups. The aminopropyl groups were protected with the DDZgroup by reaction of the carbonyl chloride with the amino groups, areaction readily known to those of skill in the art. The surface bearingthese groups was placed in a solution of THF and contacted with a maskbearing a checkerboard pattern of 1 mm opaque and transparent regions.The mask was exposed to ultraviolet light having a wavelength down to atleast about 280 nm for about 5 minutes at ambient temperature, althougha wide range of exposure times and temperatures may be appropriate invarious embodiments of the invention. For example, in one embodiment, anexposure time of between about 1 and 5000 seconds may be used at processtemperatures of between −70 and +50° C.

[0590] In one preferred embodiment, exposure times of between about 1and 500 seconds at about ambient pressure are used. In some preferredembodiments, pressure above ambient is used to prevent evaporation.

[0591] The surface of the membrane was then washed for about 1 hour witha fluorescent label which included an active ester bound to a chelate ofa lanthanide. Wash times will vary over a wide range of values fromabout a few minutes to a few hours. These materials fluoresce in the redand the green visible region. After the reaction with the active esterin the fluorophore was complete, the locations in which the fluorophorewas bound could be visualized by exposing them to ultraviolet light andobserving the red and the green fluorescence. It was observed that thederivatized regions of the substrate closely corresponded to theoriginal pattern of the mask.

[0592] 5. Demonstration of Signal Capability

[0593] Signal detection capability was demonstrated using a low-levelstandard fluorescent bead kit manufactured by Flow Cytometry Standardsand having model no. 824. This kit includes 5.8 μm diameter beads, eachimpregnated with a known number of fluorescein molecules.

[0594] One of the beads was placed in the illumination field on the scanstage in a field of a laser spot which was initially shuttered. Afterbeing positioned in the illumination field, the photon detectionequipment was turned on. The laser beam was unblocked and it interactedwith the particle bead, which then fluoresced. Fluorescence curves ofbeads impregnated with 7,000 and 29,000 fluorescein molecules, are shownin FIGS. 11A and 11B, respectively of Pirrung et al. (1992) U.S. Pat.No. 5,143,854. On each curve, traces for beads without fluoresceinmolecules are also shown. These experiments were performed with 488 nmexcitation, with 100 μW of laser power. The light was focused through a40 power 0.75 NA objective.

[0595] The fluorescence intensity in all cases started off at a highvalue and then decreased exponentially. The fall-off in intensity is dueto photobleaching of the fluorescein molecules. The traces of beadswithout fluorescein molecules are used for background subtraction. Thedifference in the initial exponential decay between labeled andnonlabeled beads is integrated to give the total number of photoncounts, and this number is related to the number of molecules per bead.Therefore, it is possible to deduce the number of photons perfluorescein molecule that can be detected. This calculation indicatesthe radiation of about 40 to 50 photons per fluorescein molecule aredetected.

[0596] 6. Determination of the Number of Molecules Per Unit Area

[0597] Aminopropylated glass microscope slides prepared according to themethods discussed above were utilized in order to establish the densityof labeling of the slides. The free amino termini of the slides werereacted with FITC (fluorescein isothiocyanate) which forms a covalentlinkage with the amino group. The slide is then scanned to count thenumber of fluorescent photons generated in a region which, using theestimated 40-50 photons per fluorescent molecule, enables thecalculation of the number of molecules which are on the surface per unitarea.

[0598] A slide with aminopropyl silane on its surface was immersed in a1 mM solution of FITC in DMF for 1 hour at about ambient temperature.After reaction, the slide was washed twice with DMF and then washed withethanol, water, and then ethanol again. It was then dried and stored inthe dark until it was ready to be examined.

[0599] Through the use of curves similar to those shown in FIG. 11 ofPirrung et al. (1992) U.S. Pat. No. 5,143,854, and by integrating thefluorescent counts under the exponentially decaying signal, the numberof free amino groups on the surface after derivitization was determined.It was determined that slides with labeling densities of 1 fluoresceinper 10³×10³ to ˜2×2 nm could be reproducibly made as the concentrationof aminopropyltriethoxysilane varied from 10⁻⁵% to 10⁻¹%.

[0600] 7. Removal of NVOC and Attachment of a Fluorescent Marker

[0601] NVOC-GABA groups were attached as described above. The entiresurface of one slide was exposed to light so as to expose a free aminogroup at the end of the gamma amino butyric acid. This slide, and aduplicate which was not exposed, were then exposed to fluoresceinisothiocyanate (FITC).

[0602]FIG. 12A of Pirrung et al. (1992) U.S. Pat. No. 5,143,854illustrates the slide which was not exposed to light, but which wasexposed to FITC. The units of the x axis are time and the units of the yaxis are counts. The trace contains a certain amount of backgroundfluorescence. The duplicate slide was exposed to 350 nm broadbandillumination for about 1 minute (12 mW/cm², ˜350 nm illumination),washed and reacted with FITC. A large increase in the level offluorescence is observed, which indicates photolysis has exposed anumber of amino groups on the surface of the slides for attachment of afluorescent marker.

[0603] 8. Use of a Mask in Removal of NVOC

[0604] The next experiment was performed with a 0.1% aminopropylatedslide. Light from a Hg—Xe arc lamp was imaged onto the substrate througha laser-ablated chrome-on-glass mask in direct contact with thesubstrate.

[0605] This slide was illuminated for approximately 5 minutes, with 12mW of 350 nm broadband light and then reacted with the 1 mM FITCsolution. It was put on the laser detection scanning stage and a graphwas plotted as a two-dimensional representation of position color-codedfor fluorescence intensity. The experiment was repeated a number oftimes through various masks. The fluorescence patterns for a 100×100 μmmask, a 50 μm mask, a 20 μm mask, and a 10 μm mask indicate that themask pattern is distinct down to at least about 10 μm squares using thislithographic technique.

[0606] 9. Attachment of YGGFL and Subsequent Exposure to Herz Antibodyand Goat Anti-Mouse Antibody

[0607] In order to establish that receptors to a particular polypeptidesequence would bind to a surface-bound peptide and be detected, Leuenkephalin was c6upled to the surface and recognized by an antibody. Aslide was derivatized with 0.1% amino propyl-triethoxysilane andprotected with NVOC. A 500 μm checkerboard mask was used to expose theslide in a flow cell using backside contact printing. The Leu enkephalinsequence (H,N-tyrosine,glycine,glycine,phenylalanine,leucine-COOH,otherwise referred to herein as YGGFL) was attached via its carboxy endto the exposed amino groups on the surface of the slide. The peptide wasadded in DMF solution with the BOP/HOBT/DIEA coupling reagents andrecirculated through the flow cell for 2 hours at room temperature.

[0608] A first antibody, known as the Herz antibody, was applied to thesurface of the slide for 45 minutes at 2 μg/ml in a supercocktail(containing 1% BSA and 1% ovalbumin also in this case). A secondantibody, goat anti-mouse fluorescein conjugate, was then added at 2μg/ml in the supercocktail buffer, and allowed to incubate for 2 hours.

[0609] The results of this experiment were plotted as fluorescenceintensity as a function of position. This image was taken at 10 μm stepsand showed that not only can deprotection be carried out in a welldefined pattern, but also that (1) the method provided for successfulcoupling of peptides to the surface of the substrate, (2) the surface ofa bound peptide was available for binding with an antibody, and (3) thedetection apparatus capabilities were sufficient to detect binding of areceptor. Moreover, the Herz antibody is a sequence specific reagentwhich may be used advantageously as a sequence specific recognitionreagent. It may be used, if specificity is high, for sequencingpurposes, and, at least, for fingerprinting and mapping uses.

[0610] 10. Monomer-By-Monomer Formation of YGGFL and Subsequent Exposureto Labeled Antibody

[0611] Monomer-by-monomer synthesis of YGGFL and GGFL in alternatesquares was performed on a slide in a checkerboard pattern and theresulting slide was exposed to the Herz antibody.

[0612] A slide is derivatized with the aminopropyl group, protected inthis case with t-BOC (t-butoxycarbonyl). The slide was treated with TFAto remove the t-BOC protecting group. E-aminocaproic acid, which wast-BOC protected at its amino group, was then coupled onto theaminopropyl groups. The aminocaproic acid serves as a spacer between theaminopropyl group and the peptide to be synthesized. The amino end ofthe spacer was deprotected and coupled to NVOC-leucine. The entire slidewas then illuminated with 12 mW of 325 nm broadband illumination. Theslide was then coupled with NVOC-phenylalanine and washed. The entireslide was again illuminated, then coupled to NVOC-glycine and washed.The slide was again illuminated and coupled to NVOC-glycine to form thesequence shown in the last portion of FIG. 13A of Pirrung et al. (1992)U.S. Pat. No. 5,143,854.

[0613] Alternating regions of the slide were then illuminated using aprojection print using a 500×500 μm checkerboard mask; thus, the aminogroup of glycine was exposed only in the lighted areas. When the nextcoupling chemistry step was carried out, NVOC-tyrosine was added, and itcoupled only at those spots which had received illumination. The entireslide was then illuminated to remove all the NVOC groups, leaving acheckerboard of YGGFL in the lighted areas and in the other areas, GGFL.The Herz antibody (which recognizes the YGGFL, but not GGFL) was thenadded, followed by goat anti-mouse fluorescein conjugate.

[0614] The resulting fluorescence scan showed dark areas containing thetetrapeptide GGFL, which is not recognized by the Herz antibody (andthus there is no binding of the goat anti-mouse antibody withfluorescein conjugate), and red areas in which YGGFL was present. TheYGGFL pentapeptide is recognized by the Herz antibody and, therefore,there is antibody in the lighted regions for the fluorescein-conjugatedgoat anti-mouse to recognize.

[0615] Similar patterns for a 50 μm mask used in direct contact(“proximity print”) with the substrate provided a pattern which was moredistinct and the corners of the checkerboard pattern were touching as aresult of the mask being placed in direct contact with the substrate(which reflects the increase in resolution using this technique).

[0616] 11. Monomer-By-Monomer Synthesis of YGGFL and PGGFL

[0617] A synthesis using a 50 μm checkerboard mask was conducted.However, P was added to the GGFL sites on the substrate through anadditional coupling step. P was added by exposing protected GGFL tolight through a mask, and subsequence exposure to P in the manner setforth above. Therefore, half of the regions on the substrate containedYGGFL and the remaining half contained PGGFL.

[0618] The fluorescence plot for this experiment showed the regions areagain readily discernable between those in which binding did and did notoccur. This experiment demonstrated that antibodies are able torecognize a specific sequence and that the recognition is notlength-dependent.

[0619] 12. Monomer-By-Monomer Synthesis of YGGFL and YPGGFL

[0620] In order to further demonstrate the operability of the invention,a 50 μm checkerboard pattern of alternating YGGFL and YPGGFL wassynthesized on a substrate using techniques like those set forth above.The resulting fluorescence plot showed that the antibody was clearlyable to recognize the YGGFL sequence and did not bind significantly atthe YPGGFL regions.

[0621] 13. Synthesis of an Array of Sixteen Different Amino AcidSequences and Estimation of relative binding affinity to herz antibody

[0622] Using techniques similar to those set forth above, an array of 16different amino acid sequences (replicated four times) was synthesizedon each of two glass substrates. The sequences were synthesized byattaching the sequence NVOC-GFL across the entire surface of the slides.Using a series of masks, two layers of amino acids were then selectivelyapplied to the substrate. Each region had dimensions of 0.25 cm×0.0625cm. The first slide contained amino acid sequences containing onlyL-amino acids while the second slide contained selected D-amino acids.Various regions on the first and second slides, were duplicated fourtimes on each slide. The slides were then exposed to the Herz antibodyand fluorescein-labeled goat anti-mouse antibodies.

[0623] A fluorescence plot of the first slide, which contained onlyL-amino acids showed red areas (indicating strong binding, i.e., 149,000counts or more) and black areas (indicating little or no binding of theHerz antibody, i.e., 20,000 counts or less). The sequence YGGFL wasclearly most strongly recognized. The sequences YAGFL and YSGFL alsoexhibited strong recognition of the antibody. By contrast, most of theremaining sequences showed little or no binding. The four duplicateportions of the slide were extremely consistent in the amount of bindingshown therein.

[0624] A fluorescence plot of the D-amino acid slide indicated thatstrongest binding was exhibited by the YGGFL sequence. Significantbinding was also detected to YaGFL, YsGFL, and YpGFL. The remainingsequences showed less binding with the antibody. Low binding efficiencyof the sequence yGGFL was observed.

[0625] Table 6 lists the various sequences tested in order of relativefluorescence, which provides information regarding relative bindingaffinity. TABLE 6 Apparent Binding to Herz Ab L-a.a. Set D-a.a. SetYGGFL YGGFL YAGFL YaGFL YSGFL YsGFL LGGFL YpGFL FGGFL fGGFL YPGFL yGGFLLAGFL faGFL FAGFL wGGFL WGGFL yaGFL fpGFL waGFL

[0626] 14. Illustrative Alternative Embodiment

[0627] According to an alternative embodiment of the invention, themethods provide for attaching to the surface a caged binding memberwhich, in its caged form, has a relatively low affinity for otherpotentially binding species, such as receptors and specific bindingsubstances. Such techniques are more fully described in copendingapplication Ser. No. 404,920, filed Sep. 8, 1989, and incorporatedherein by reference for all purposes. See also Ser. No. 07/435,316, nowabandoned, and Barrett et al. (1993) U.S. Pat. No. 5,252,743, each ofwhich is hereby incorporated herein by reference.

[0628] According to this alternative embodiment, the invention providesmethods for forming predefined regions on a surface of a solid support,wherein the predefined regions are capable of immobilizing receptors.The methods make use of caged binding members attached to the surface toenable selective activation of the predefined regions. The caged bindingmembers are liberated to act as binding members ultimately capable ofbinding receptors upon selective activation of the predefined regions.The activated binding members are then used to immobilize specificmolecules such as receptors on the predefined region of the surface. Theabove procedure is repeated at the same or different sites on thesurface so as to provide a surface prepared with a plurality of regionson the surface containing, for example, the same or different receptors.When receptors immobilized in this way have a differential affinity forone or more ligands, screenings and assays for the ligands can beconducted in the regions of the surface containing the receptors.

[0629] The alternative embodiment may make use of novel caged bindingmembers attached to the substrate. Caged (unactivated) members have arelatively low affinity for receptors of substances that specificallybind to uncaged binding members when compared with the correspondingaffinities of activated binding members. Thus, the binding members areprotected from reaction until a suitable source of energy is applied tothe regions of the surface desired to be activated. Upon application ofa suitable energy source, the caging groups labilize, thereby presentingthe activated binding member. A typical energy source will be light.

[0630] Once the binding members on the surface are activated they may beattached to a receptor. The receptor chosen may be a monoclonalantibody, a nucleic acid sequence, a drug receptor, etc. The receptorwill usually, though not always, be prepared so as to permit attachingit, directly or indirectly, to a binding member. For example, a specificbinding substance having a strong binding affinity for the bindingmember and a strong affinity for the receptor or a conjugate of thereceptor may be used to act as a bridge between binding members andreceptors if desired. The method uses a receptor prepared such that thereceptor retains its activity toward a particular ligand.

[0631] Preferably, the caged binding member attached to the solidsubstrate will be a photoactivatable biotin complex, i.e., a biotinmolecule that has been chemically modified with photoactivatableprotecting groups so that it has a significantly reduced bindingaffinity for avidin or avidin analogs than does natural biotin. In apreferred embodiment, the protecting groups localized in a predefinedregion of the surface will be removed upon application of a suitablesource of radiation to give binding members, that is biotin or afunctionally analogous compound having substantially the same bindingaffinity for avidin or avidin analogs as does biotin.

[0632] In another preferred embodiment, avidin or an avidin analog isincubated with activated binding members on the surface until the avidinbinds strongly to the binding members. The avidin so immobilized onpredefined regions of the surface can then be incubated with a desiredreceptor or conjugate of a desired receptor. The receptor willpreferably be biotinylated, e.g., a biotinylated antibody, when avidinis immobilized on the predefined regions of the surface. Alternatively,a preferred embodiment will present an avidin/biotinylated receptorcomplex, which has been previously prepared, to activated bindingmembers on the surface.

[0633] II. Fingerprinting

[0634] The above section on generation of reagents for sequencingprovides specific reagents useful for fingerprinting applications.Fingerprinting embodiments may be applied towards polynucleotidefingerprinting, polypeptide fingerprinting, cell and tissueclassification, cell and tissue temporal development stageclassification, diagnostic tests, forensic uses for individualidentification, classification of organisms, and genetic screening ofindividuals. Mapping applications are also described below.

[0635] A. Polynucleotide Fingerprint

[0636] Polynucleotide fingerprinting may use reagents similar to thosedescribed above for probing a sequence for the presence of specificsubsequences found therein. Typically, the subsequences used forfingerprinting will be longer than the sequences used in oligonucleotidesequencing. In particular, specific long segments may be used todetermine the similarity of different samples of nucleic acids. They mayalso be used to fingerprint whether specific combinations of informationare provided therein. Particular probe sequences are selected andattached in a positional manner to a substrate. The means for attachmentmay be either using a caged biotin method described, e.g., in Barrett etal. (1993) U.S. Pat. No. 5,242,743, or by another method using targetingmolecules. For example, a short polypeptide of specific sequence may beattached to an oligonucleotide and targeted to specific positions on asubstrate having antibodies attached thereto, the antibodies exhibitingspecificity for binding to those short peptide sequences. In anotherembodiment, an unnatural nucleotide or similar complementary bindingmolecule may be attached to the fingerprinting probe and the probethereby directed towards complementary sequences on a VLSIPS substrate.Typically, unnatural nucleotides would be preferred, e.g., unnaturaloptical isomers, which would not interfere with natural nucleotideinteractions.

[0637] Having produced a substrate with particular fingerprint probesattached thereto at positionally defined regions, the substrate may beused in a manner quite similar to the sequencing embodiment to provideinformation as to whether the fingerprint probes are detecting thecorresponding sequence in a target sequence. This will often provideinformation similar to a Southern blot hybridization.

[0638] B. Polypeptide Fingerprint

[0639] A polypeptide fingerprint may be performed using antibodies whichrecognize specific antigens on the polypeptide. For example, monoclonalantibodies which recognize specific sequences or antigens on apolypeptide may be used to determine whether those epitopes are found ona particular protein. For example, particular patterns of epitopes wouldbe found on various types of proteins. This will lead to the discoverythat specific epitopes, or antigenic determinants, which arecharacteristic of, e.g., beta sheet segments, will be identified as willparticular different types of domains in various protein types. Thus, ascreening method may be devised which can classify polypeptides, eithernative or denatured, into various new classes defined by the epitopesexisting thereon.

[0640] In addition, once the substrate is generated in the mannersdescribed above, a target peptide is exposed to the substrate. Thetarget may be either native or denatured, though the conditions used todenature the polypeptide may interfere with the specific interactionbetween the polypeptide and the recognition reagent. This method is notdependent on the fact that the polypeptide is a single chain, thusprotein complexes may also be fingerprinted using this methodology.Structures such as multi-subunit proteins, associations of proteins,ribosomes, nucleosomes, and other small cellular structures may also befingerprinted and classified according to the presence of specificrecognizable features thereon.

[0641] Peptide fingerprinting may be useful, for example, in correlatingwith particular physiological conditions or developmental stages of acell or organism. Thus, a biological sample may be fingerprinted todetermine the presence in that sample of a plurality of differentpolypeptides which are each individually fingerprinted. In analternative embodiment, a polypeptide itself is not fingerprinted but abiological sample is fingerprinted searching for specific epitopes,e.g., polypeptide, carbohydrate, nucleic acid, or any of a number ofother specific recognizable structural features.

[0642] The conditions for the interactions using antibodies isdescribed, e.g., in Harlow and Lane (1988) Antibodies: A LaboratoryManual, Cold Spring Harbor Press, New York. The conditions should betitrated for temperature, buffer composition, time, and other importantparameters in an antibody interaction.

[0643] C. Cell Classification Scheme

[0644] The present invention can be used for cell classification usingfingerprinting type technology as described above in the polypeptidefingerprint. Classes of cells are typically defined by the presence ofcommon functions which are usually reflected by structural features.Thus, a plant cell is classified differently from an animal cell by anumber of structural features. Given an unknown cell, the presentinvention provides improved means for distinguishing the different celltypes. Once a cell classification scheme is developed and the structuralfeatures which define it are identified using the present invention,homogeneous cell population expressing these features may be separatedfrom others. Standard cell sorters may be coupled with recognitionreagents and labels which can distinguish various cell types.

[0645] a. T-Cell Classes

[0646] T-cell classes are defined on the basis of expression ofparticular antigens characteristic of each class. For example, mouseT-cell differentiation markers include the LY antigens. With theplurality of different antigens which may be tested using antibody orother recognition reagents, new populations and classes of cells may bedefined. For example, different neural cell types may be defined on thebasis of cell surface antigens. Different tissue types will be definedon the basis of tissue specific antigens. Developmental cell classeswill be similarly defined. All of these screenings can make use of theVLSIPS substrates with specific recognition molecules attached thereto.The substrates are exposed to the cell types directly, assaying forattachment of cells to specific regions, or are exposed to products of apopulation of cells, e.g., a supernatant, or a cell lysate.

[0647] Once a cell classification scheme has been correlated withspecific structural markers therein, reagents which recognize thosefeatures may be developed and used in a fluorescence activated cellsorter as described, e.g., in Dangl, J. and Herzenberg (1982) J.Immunological Methods 52: 1-14; and Becton Dickinson, FluorescenceActivated Cell Sorters Division, San Jose, Calif. This will provide ahomogeneous population of cells whose function has been defined bystructure.

[0648] b. B-Cell Classes

[0649] The present cell classification scheme may also be used todetermine specific B-cell classes. For example, B-cells specific forproducing IgM, IgG, IgD, IgE, and IgA may be defined by the internalexpression of specific mRNA sequences encoding each type ofimmunoglobulin. The classification scheme may depend on eitherextracellularly expressed markers which are correlated as beingdiagnostic of specific stages in development, or intracellular MRNAsequences which indicate particular functions.

[0650] D. Temporal Development Scheme

[0651] 1. Developmental Antigens

[0652] The present fingerprinting invention also allows cellclassification by expression of developmental antigens. For example, alymphocyte stem cell expresses a particular combination of antigens. Asthe lymphocyte develops through a program developmental scheme, atvarious stages it expresses particular antigens which are diagnostic ofparticular stages in development. Again, the fingerprinting methodologyallows for the definition of specific structural features which arediagnostic of developmental or functional features which will allowclassification of cells into temporal developmental classes. Cells,products of those cells, or lysates of those cells will be assayed todetermine the developmental stage of the source cells. In this manner,once a developmental stage is defined, specific synchronized populationsof cells will be selected out of another population. These synchronizedpopulations may be very important in determining the biologicalmechanisms of development.

[0653] 2. Developmental mRNA Expression

[0654] Besides expressed antigens, the present invention also allows forfingerprinting of the mRNA population of a cell. In this fashion, themRNA population, which should be a good determinant of developmentalstage, will be correlated with other structural features of the cell. Inthis manner, cells at specific developmental stages will becharacterized by the intracellular environment, as well as theextracellular environment. The present invention also allows thecombination of definitions based, in part, upon antigens and, in part,upon mRNA expression.

[0655] In one embodiment, the two may be combined in a single incubationstep. A particular incubation condition may be found which is compatiblewith both hybridization recognition and non-hybridization recognitionmolecules. Thus, e.g., an incubation condition may be selected whichallows both specificity of antibody binding and specificity of nucleicacid hybridization. This allows simultaneous performance of both typesof interactions on a single matrix. Again, where developmental mRNApatterns are correlated with structural features, or with probes whichare able to hybridize to intracellular mRNA populations, a cell sortermay be used to sort specifically those cells having desired mRNApopulation patterns.

[0656] E. Diagnostic Tests

[0657] The present invention also provides the ability to performdiagnostic tests. Diagnostic tests typically are based upon afingerprint type assay, which tests for the presence of specificdiagnostic structural features. Thus, the present invention providesmeans for viral strain identification, bacterial strain identification,and other diagnostic tests using positionally defined specific reagents.The present invention also allows for determining a spectrum ofallergies, diagnosing a biological sample for any or all of the above,and testing for many other conditions.

[0658] 1. Viral Identification

[0659] The present invention provides reagents and methodology foridentifying viral strains. The specific reagents may be eitherantibodies or recognition proteins which bind to specific viral epitopespreferably surface exposed, but may make use of internal epitopes, e.g.,in a denatured viral sample. In an alternative embodiment, the viralgenome may be probed for specific sequences which are characteristic ofparticular viral strains. As above, a combination of the two may beperformed simultaneously in a single interaction step, or in separatetests, e.g., for both genetic characteristics and epitopecharacteristics.

[0660] 2. Bacterial Identification

[0661] Similar techniques will be applicable to identifying a bacterialsource. This may be useful in diagnosing bacterial infections, or inclassifying sources of particular bacterial species. For example, thebacterial assay may be useful in determining the natural range ofsurvivability of particular strains of bacteria across regions of thecountry or in different ecological niches.

[0662] 3. Other Microbiological Identifications

[0663] The present invention provides means for diagnosis of othermicrobiological and other species, e.g., protozoal species and parasiticspecies in a biological sample, but also provides the means for assayinga combination of different infections. For example, a biologicalspecimen may be assayed for the presence of any or all of thesemicrobiological species. In human diagnostic uses, typical samples willbe blood, sputum, stool, urine, or other samples.

[0664] 4. Allergy Tests

[0665] An immobilized set of antigens may be attached to a solidsubstrate and, instead of the standard skin reaction tests, a bloodsample may be assayed on such a substrate to determine the presence ofantibodies, e.g., IgE or other type antibodies, which may be diagnosticof an allergic or immunological susceptibility. A standardradioallergosorbent test (RAST) may be used to check a much largerpopulation of antigens.

[0666] In addition, an allergy like test may be used to diagnose theimmunological history of a particular individual. For example, bytesting the circulating antibodies in a blood sample, which reflects theimmunological history and memory of an individual, it may be determinedwhat infections may not have been historically presented to the immunesystem. In this manner, it may be possible to specifically supplement animmune system for a short period of time with IgG fractions made up ofspecific types of gamma globulins. Thus, hepatitis gamma globulininjections may be better designed for a particular environment to whicha person is expected to be exposed. This also provides the ability toidentify genetically equivalent individuals who have immunologicallydifferent experiences. Thus, a blood sample from an individual who has aparticular combination of circulating antibodies will likely bedifferent from the combination of circulating antibodies found in agenetically similar or identical individual. This could allow for thedistinction between clones of particular animals, e.g., mice, rats, orother animals.

[0667] F. Individual Identification

[0668] The present invention provides the ability to fingerprint andidentify a genetic individual. This individual may be a bacterial orlower microorganism, as described above in diagnostic tests, or of aplant or animal. An individual may be identified genetically orimmunologically, as described.

[0669] 1. Genetic

[0670] Genetic fingerprinting has been utilized in comparing differentrelated species in Southern hybridization blots. Genetic fingerprintinghas also been used in forensic studies, see, e.g., Morris et al. (1989)J. Forensic Science 34: 1311-1317, and references cited therein. Asdescribed above, an individual may be identified genetically by asufficiently large number of probes. The likelihood that anotherindividual would have an identical pattern over a sufficiently largenumber of probes may be statistically negligible. However, it is oftenquite important that a large number of probes be used where thestatistical probability of matching is desired to be particularly low.In fact, the probes will optimally be selected for having highheterogeneity among the population. In addition, the fingerprint methodmay make use of the pattern of homologies indicated by a series of moreand more stringent washes. Then, each position has both a sequencespecificity and a homology measurement, the combination of which greatlyincreases the number of dimensions and the statistical likelihood of aperfect pattern match with another genetic individual.

[0671] 2. Immunological

[0672] As indicated above in the diagnostic tests, it is possible toidentify a particular immune system within a genetically homogeneousclass of organisms by virtue of their immunological history. Forexample, a large colony of cloned mice may be distinguishable by virtueof each immunological history. For example, one mouse may have had animmunological response to exposure to antigen A to which her geneticallyidentical sibling may have not been exposed. By virtue of thisdifferential history, the first of the pair will likely have a highantibody titer against the antigen A whereas her genetically identicalsibling will have not had a response to that antigen by virtue of neverhaving been exposed to it. For this reason, immune systems may beidentified by their immunological memories. Thus, immunologicalexperience may also be a means for identifying a particular individualat a particular moment in her lifetime.

[0673] This same immunological screening may be used for other sorts ofidentifiable biological products. For example, an individual may beidentified by her combination of expressed proteins. These proteins mayreflect a physiological state of the individual, and would—thus beuseful in certain-circumstances where diagnostic tests may be performed.For example, an individual may be identified, in part, by the presenceof particular metabolic products.

[0674] In fact, a plant origin may be determined by virtue of havingwithin its genome an unnatural sequence introduced to it by geneticbreeders. Thus, a marker nucleic acid sequence may be introduced as ameans to determine whether a genetic strain of a plant or animaloriginated from another particular source.

[0675] G. Genetic Screening

[0676] 1. Test Alleles with Markers

[0677] The present invention provides for the ability to screen forgenetic variations of individuals. For example, a number of geneticdiseases are linked with specific alleles. See, e.g., Scriber, C. et al.(eds.) (1989) The Metabolic Bases of Inherited Disease, McGraw-Hill, NewYork. In one embodiment, cystic fibrosis has been correlated with aspecific gene, see, Gregory et al. (1990) Nature 347: 382-386. A numberof alleles are correlated with specific genetic deficiencies. See, e.g.,McKusick, V. (1990) Genetic Inheritance in Man: Catalogs of AutosomalDominant. Autosomal Recessive, and X-linked Phenotypes, Johns HopkinsUniversity Press, Baltimore; Ott, J. (1985) Analysis of Human GeneticLinkage, Johns Hopkins University Press, Baltimore; Track, R. et al.(1989) Banbury ReDort 32: DNA Technology and Forensic Science, ColdSpring Harbor Press, New York; each of which is hereby incorporatedherein by reference.

[0678] 2. Amniocentesis

[0679] Typically, amniocentesis is used to determine whether chromosometranslocations have occurred. The mapping procedure may provide themeans for determining whether these translocations have occurred, andfor detecting particular alleles of various markers.

[0680] III. MAPPING

[0681] A. Positionally Located Clones The present invention allows forthe positional location of specific clones useful for mapping. Forexample, caged biotin may be used for specifically positioning a probeto a location on a matrix pattern.

[0682] In addition, the specific probes may be positionally directed tospecific locations on a substrate by targeting. For example, polypeptidespecific recognition reagents may be attached to oligonucleotidesequences which can be complementarily targeted to specific locations ona VLSIPS™ Technology substrate. Hybridization conditions, as applied foroligonucleotide probes, will be used to target the reagents to locationson a substrate having complementary oligonucleotides synthesizedthereon. In another embodiment, oligonucleotide probes may be attachedto specific polypeptide targeting reagents such as an antigen orantibody. These reagents can be directed towards a complementary antigenor antibody already attached to a VLSIPS substrate.

[0683] In another embodiment, an unnatural nucleotide which does notinterfere with natural nucleotide complementary hybridization may beused to target oligonucleotides to particular positions on a substrate.Unnatural optical isomers of natural nucleotides should be idealcandidates.

[0684] In this way, short probes may be used to determine the mapping oflong targets or long targets may be used to map the position of shorterprobes. See, e.g., Craig et al. 1990 Nuc. Acids Res. 18: 2653-2660.

[0685] B. Positionally Defined Clones

[0686] Positionally defined clones may be transferred to a new substrateby either physical transfer or by synthetic means. Synthetic means mayinvolve either a production of the probe on the substrate using theVLSIPS™ Technology synthetic methods, or may involve the attachment of atargeting sequence made by VLSIPS synthetic methods which will targetthat positionally defined clone to a position on a new substrate. Bothmethods will provide a substrate having a number of positionally definedprobes useful in mapping.

[0687] IX. Conclusion

[0688] The present inventions provide greatly improved methods andapparatus for synthesis of polymers on substrates. It is to beunderstood that the above description is intended to be illustrative andnot restrictive. Many embodiments will be apparent to those of skill inthe art upon reviewing the above description. By way of example, theinvention has been described primarily with reference to the use ofphotoremovable protective groups, but it will be readily recognized bythose of skill in the art that sources of radiation other than lightcould also be used. For example, in some embodiments it may be desirableto use protective groups which are sensitive to electron beamirradiation, x-ray irradiation, in combination with electron beamlithograph, or x-ray lithography techniques. Alternatively, the groupcould be removed by exposure to an electric current. The scope of theinvention should, therefore, be determined not with reference to theabove description, but should instead be determined with reference tothe appended claims, along with the full scope of equivalents to whichsuch claims are entitled.

[0689] All publications and patent applications referred to herein areincorporated by reference to the same extent as if each individualpublication or patent application was specifically and individuallyincorporated by reference. The present invention now being fullydescribed, it will be apparent to one of ordinary skill in the art thatmany changes and modifications can be made thereto without departingfrom the spirit or scope of the appended claims.

What is claimed is:
 1. A composition comprising a plurality ofpositionally distinguishable sequence specific reagents attached to asolid substrate, which reagents are capable of specifically binding to apredetermined subunit sequence of a preselected multi-subunit lengthhaving at least three subunits, said reagents representing substantiallyall possible sequences of said preselected length.
 2. A composition ofclaim 1, wherein said subunit sequence is a polynucleotide or apolypeptide.
 3. A composition of claim 1, wherein said preselectedmulti-subunit length is five subunits and said subunit sequence is apolynucleotide sequence.
 4. A composition of claim 1, wherein saidspecific reagent is an oligonucleotide of at least about fivenucleotides.
 5. A composition of claim 1, wherein said specific reagentis a monoclonal antibody.
 6. A composition of claim 1, wherein saidspecific reagents are all attached to a single solid substrate.
 7. Acomposition of claim 1, wherein said reagents comprise about 3000different sequences.
 8. A composition of claim 1, wherein said reagentsrepresents at least about 25% of the possible subsequences of saidpreselected length.
 9. A composition of claim 1, wherein said reagentsare localized in regions of the substrate having a density of at least25 regions per square centimeter.
 10. A composition of claim 6, whereinsaid substrate has a surface area of less than about 4 squarecentimeters.
 11. A method of analyzing a sequence of a polynucleotide ora polypeptide, said method comprising the step of: a) exposing saidpolynucleotide or polypeptide to a composition of claim
 1. 12. A methodof identifying or comparing a target sequence with a reference, saidmethod comprising the step of: a) exposing said target sequence to acomposition of claim 1; b) determining the pattern of positions of saidreagents which specifically interact with said target sequence; and c)comparing said pattern with the pattern exhibited by said reference whenexposed to said composition.
 13. A method for sequencing a segment of apolynucleotide comprising the steps of: a) combining: i) a substratecomprising a plurality of chemically synthesized and positionallydistinguishable oligonucleotides capable of recognizing definedoligonucleotide sequences; and ii) a target polynucleotide; therebyforming high fidelity matched duplex structures of complementarysubsequences of known sequence; and b) determining which of saidreagents have specifically interacted with subsequences in said targetpolynucleotide.
 14. A method of claim 13, wherein said segment issubstantially the entire length of said polynucleotide.
 15. A method forsequencing a polymer, said method comprising the steps of: a) preparinga plurality of reagents which each specifically bind to a subsequence ofpreselected length; b) positionally attaching each of said reagents toone or more solid phase substrates, thereby producing substrates ofpositionally definable sequence specific probes; c) combining saidsubstrates with a target polymer whose sequence is to be determined; andd) determining which of said reagents have specifically interacted withsubsequences in said target polymer.
 16. A method of claim 15, whereinsaid substrates are beads.
 17. A method of claim 15, wherein saidplurality of reagents comprise substantially all possible subsequencesof said preselected length found in said target.
 18. A method of claim15, wherein said solid phase substrates are a single substrate havingattached thereto reagents recognizing substantially all possiblesubsequences of preselected length found in said target.
 19. A method ofclaim 15, further comprising the step of analyzing a plurality of saidrecognized subsequences to assemble a sequence of said target polymer.20. A method of claim 16, wherein at least some of said plurality ofsubstrates have one subsequence specific reagent attached thereto, andsaid substrates are coded to indicate the specificity of said reagent.21. A method of using a fluorescent nucleotide to detect interactionswith oligonucleotide probes of known sequence, said method comprising:a) attaching said nucleotide to a target unknown polynucleotidesequence, and b) exposing said target polynucleotide sequence to acollection of positionally defined oligonucleotide probes of knownsequences to determine the sequences of said probes which interact withsaid target.
 22. A method of claim 21, further comprising the step of:a) collating said known sequences to determine the overlaps of saidknown sequences to determine the sequence of said target sequence.
 23. Amethod of mapping a plurality of sequences relative to one another, saidmethod comprising: a) preparing a substrate having a plurality ofpositionally attached sequence specific probes; b) exposing each of saidsequences to said substrate, thereby determining the patterns ofinteraction between said sequence specific probes and said sequences;and c) determining the relative locations of said sequence specificprobe interactions on said sequences to determine the overlaps and orderof said sequences.
 24. A method of claim 23, wherein said sequencespecific probes are oligonucleotides.
 25. A method of claim 23, whereinsaid sequences are nucleic acid sequences.